CN111652383A - Data contribution degree evaluation method, device, equipment and storage medium - Google Patents

Data contribution degree evaluation method, device, equipment and storage medium Download PDF

Info

Publication number
CN111652383A
CN111652383A CN202010504333.3A CN202010504333A CN111652383A CN 111652383 A CN111652383 A CN 111652383A CN 202010504333 A CN202010504333 A CN 202010504333A CN 111652383 A CN111652383 A CN 111652383A
Authority
CN
China
Prior art keywords
data
model performance
degree
countermeasure
contribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010504333.3A
Other languages
Chinese (zh)
Inventor
范力欣
张天豫
吴锦和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202010504333.3A priority Critical patent/CN111652383A/en
Publication of CN111652383A publication Critical patent/CN111652383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data contribution degree evaluation method, a device, equipment and a storage medium, wherein the method comprises the following steps: determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants to obtain a corresponding first basic model performance degree; generating countermeasure data for the first data; determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree; and evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree. The method and the device solve the technical problem that in the process of using the participators to improve the model performance in the prior art, the contribution degree of each participator on the improvement of the model performance cannot be accurately calculated.

Description

Data contribution degree evaluation method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technology for financial technology (Fintech), and in particular, to a method, an apparatus, a device, and a storage medium for evaluating data contribution.
Background
With the continuous development of financial science and technology, especially internet science and technology finance, more and more technologies are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, the financial industry also has higher requirements on data contribution evaluation.
Data (or data sets) are regarded by various social circles as essential important components in big data analysis and artificial intelligence. However, under a new data processing method represented by federal learning, the value of each isolated/independent data (or data set) is more fully embodied, specifically, for example, after a data model is established by using an existing data set, the capability of the existing model can be improved by using other data sources (participants), however, at present, in the process of improving the model performance by using the participants, the technical problem of low accuracy of calculation of the contribution degree of each participant on model improvement exists.
Disclosure of Invention
The application mainly aims to provide a data contribution degree evaluation method, a data contribution degree evaluation device, data contribution degree evaluation equipment and a storage medium, and aims to solve the technical problem that in the prior art, the contribution degree of each participant on the improvement of model performance cannot be accurately calculated in the process of improving the model performance by using the participants.
In order to achieve the above object, the present application provides a data contribution degree evaluation method, where a first party and a second party are federately connected, the data contribution degree evaluation method includes:
determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants to obtain a corresponding first basic model performance degree;
generating countermeasure data for the first data;
determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree;
and evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree.
Optionally, the step of generating countermeasure data for the first data includes:
acquiring first label content of the first data, and replacing the first label content with other label content;
combining the other tag content with other data than the first tag content of the first data to generate countermeasure data.
Optionally, the countermeasure data comprises a plurality of sets;
the step of determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree includes:
combining one group of countermeasure data in the multiple groups and the second data each time to determine multiple second target federal learning models so as to obtain corresponding multiple target model performance degrees;
carrying out average processing on the performance degrees of the plurality of target models to obtain an average model performance degree;
and setting the average value model performance degree as the first countermeasure model performance degree.
Optionally, the step of evaluating the contribution degree of the first data according to the first base model performance degree and the first countermeasure model performance degree includes:
obtaining a second average model performance difference value, wherein the second average model performance difference value is determined by the second basic model performance degrees and the second reactance model performance degrees of other second participants;
and evaluating the contribution degree of the first data according to the first basic model performance degree, the first countermeasure model performance degree and the second average model performance difference value.
Optionally, the step of evaluating the contribution of the first data according to the first basic model performance degree, the first countermeasure model performance degree, and the second average model performance difference includes:
determining a first average model performance difference value according to the first basic model performance degree and the first countermeasure model performance degree;
adding and processing the first average model performance difference value and the plurality of second average model performance difference values to obtain an added and average model performance difference value;
determining a relative model performance difference for the first participant based on the summed average model performance difference and the first average model performance difference;
and setting the relative model performance difference as the contribution degree of the first data.
Optionally, the step of combining the other tag contents with other data than the first tag content of the first data to generate countermeasure data includes:
determining a task type of the first target federated learning model;
if the task type is a classification identification task, combining the other label contents with other data except the classification label contents of the first data to generate countermeasure data;
or if the task type is a regression task, combining the other label contents with other data except the numerical output content of the first data to generate countermeasure data;
or if the task type is a picture output task, combining the other tag contents with other data except the picture output tag contents of the first data to generate countermeasure data.
In order to achieve the above object, the present application provides a data contribution degree evaluation method, which is applied to a third party, where the third party is in communication connection with a first participant and other second participants, respectively, and the data contribution degree evaluation method includes:
receiving second basic model performance degrees and second countermeasure model performance degrees of other second participants, and receiving first basic model performance degrees and first countermeasure model performance degrees of a first participant;
determining a relative model performance difference value of a first participant according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree and the second countermeasure model performance degree so as to evaluate the contribution degree of the first data;
and sending the contribution degree of the first data to the first participant.
The present application further provides a data contribution degree evaluation device, including a first party and a second party, the first party and the second party are connected in a federal manner, the data contribution degree evaluation device includes:
the first determining module is used for determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants so as to obtain corresponding first basic model performance;
a generation module for generating countermeasure data of the first data;
the second determining module is used for determining a second target federal learning model by combining the countermeasure data and the second data so as to obtain the corresponding first countermeasure model performance degree;
and the evaluation module is used for evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree.
Optionally, the generating module includes:
a first obtaining unit, configured to obtain a first tag content of the first data, and replace the first tag content with another tag content;
a generating unit, configured to combine the other tag content with other data other than the first tag content of the first data to generate countermeasure data.
Optionally, the countermeasure data comprises a plurality of sets;
the second determining module includes:
the determining unit is used for combining one group of countermeasure data in the multiple groups and the second data at a time to determine a plurality of second target federal learning models so as to obtain corresponding performance degrees of the plurality of target models;
the mean value processing unit is used for carrying out mean value processing on the performance degrees of the plurality of target models to obtain the performance degree of the mean value model;
and the setting unit is used for setting the mean value model performance degree as the first countermeasure model performance degree.
Optionally, the evaluation module comprises:
the second obtaining unit is used for obtaining a second average model performance difference value, and the second average model performance difference value is determined by the second basic model performance degrees and the second countermeasure model performance degrees of other second participants;
and the evaluation unit is used for evaluating the contribution degree of the first data according to the first basic model performance degree, the first countermeasure model performance degree and the second average model performance difference value.
Optionally, the evaluation unit comprises:
the first determining subunit is configured to determine a first average model performance difference according to the first basic model performance degree and the first countermeasure model performance degree;
the adding subunit is used for adding and processing the first average model performance difference value and the plurality of second average model performance difference values to obtain an added average model performance difference value;
a second determining subunit, configured to determine a relative model performance difference of the first participant based on the summed average model performance difference and the first average model performance difference;
and the setting subunit is used for setting the relative model performance difference as the contribution degree of the first data.
Optionally, the generating unit includes:
a third determining subunit, configured to determine a task type of the first target federated learning model;
the generation subunit is used for combining the other label contents with other data except the classification label contents of the first data to generate countermeasure data if the task type is a classification identification task;
or if the task type is a regression task, combining the other label contents with other data except the numerical output content of the first data to generate countermeasure data;
or if the task type is a picture output task, combining the other tag contents with other data except the picture output tag contents of the first data to generate countermeasure data.
In order to achieve the above object, the present application provides a data contribution degree evaluation device, which is applied to a third party, where the third party is in communication connection with a first participant and other second participants, respectively, and the data contribution degree evaluation device includes:
the receiving module is used for receiving the second basic model performance degrees and the second countermeasure model performance degrees of other second participants and receiving the first basic model performance degree and the first countermeasure model performance degree of the first participant;
a third determining module, configured to determine a relative model performance difference of a first participant according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree, and the second countermeasure model performance degree, so as to evaluate a contribution degree of the first data;
a sending module, configured to send the contribution degree of the first data to the first party.
The present application also provides a data contribution degree evaluation device, the data contribution degree evaluation device being an entity device, the data contribution degree evaluation device including: a memory, a processor, and a program of the data contribution degree evaluation method stored on the memory and executable on the processor, which when executed by the processor, may implement the steps of the data contribution degree evaluation method as described above.
The present application also provides a storage medium having stored thereon a program for implementing the above-described data contribution degree evaluation method, the program implementing the steps of the above-described data contribution degree evaluation method when executed by a processor.
The method comprises the steps that after a first basic model performance degree of a first target federal learning model is determined by combining first data to be detected of a first participant and second data of other second participants, countermeasure data of the first data are generated; and then determining a first countermeasure model performance degree of a second target federal learning model by combining the countermeasure data and the second data, evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree from the positive and negative evaluation angles, further improving the accuracy of calculation of the contribution degree of the first data, avoiding the influence caused by accidental factors, and solving the technical problem that the contribution degree of each participant on the improvement of the model performance cannot be accurately calculated in the process of improving the model performance by using the participant in the prior art.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart illustrating a first embodiment of a data contribution evaluation method according to the present application;
FIG. 2 is a schematic flow chart illustrating a refining step of the countermeasure data for generating the first data in the data contribution degree evaluation method of the present application;
fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In a first embodiment of the data contribution degree evaluation method of the present application, referring to fig. 1, the data contribution degree evaluation method is applied to a first participant, and the first participant and a second participant are connected in a federal manner, and the data contribution degree evaluation method includes:
step S10, determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants to obtain a corresponding first basic model performance degree;
step S20, generating countermeasure data of the first data;
step S30, determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree;
step S40, evaluating the contribution of the first data according to the first basic model performance and the first countermeasure model performance.
The method comprises the following specific steps:
step S10, determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants to obtain a corresponding first basic model performance degree;
in this embodiment, it should be noted that the method can be applied to a data contribution evaluation system, and the evaluation of the contribution by the data contribution evaluation system can ensure the fairness and the safety of the evaluation, the data contribution evaluation system belongs to a data contribution evaluation device, and for the data contribution evaluation system, the method includes a first party and a second party, and the first party and the second party are in federal communication, wherein the first party is a party having first data to be detected, the second party is a party having second data and being federal with the first party, and in addition, the data contribution evaluation system may further include a coordinator, in particular, in this embodiment, the coordinator (coordiator) initiates a federal learning task (such as a longitudinal federal learning task), and in particular, the coordinator sends an initial model of the learning task to each party, each participant trains the initial model based on the initial model (the initial model of the first target federal learning model) and local training data of each participant (the first participant trains the initial model based on the first data, and the second participant trains the initial model based on the second data), after training, the coordinator performs a preset federal process as an intermediate party, specifically, based on combined model parameters obtained by the coordinator combining the trained model parameters of each participant, each participant continues training based on the combined model parameters until reaching a preset end condition, so as to obtain a first target federal learning model (which can be located in the coordinator), and based on the first target federal learning model, a corresponding first basic model performance is obtained, wherein the first basic model performance includes prediction accuracy or prediction precision and the like.
Or, in this embodiment, the first participant sends an initial model of the federal learning task to each second participant, each participant trains the initial model based on the initial model and training data local to each participant (the first participant is based on the first data, and the second participant is based on the second data), and after training, executes a preset federal process, specifically, each participant trains continuously until reaching a preset end condition (which may be a loss function convergence or training reaching a preset number of times) based on a combined model parameter obtained by the first participant combining model parameters of each participant, so as to obtain a first target federal learning model (located in the first participant), and then the contribution degree evaluation system obtains a corresponding first basic model performance degree based on the first target federal learning model, to ensure the safety of the evaluation.
It should be noted that, a first target federal learning model may be determined based on the block chain joint to combine the first data to be detected of the first participant and the second data of the other second participants to obtain the corresponding first basic model performance degree, specifically, the data contribution degree evaluation system sends the initial model to the block chain, each participant acquires the initial model and the data corresponding to each participant based on the block chain (the first participant is based on the first data, and the second participant is based on the second data), trains the initial model, and after training, executes a preset federal flow through the block chain to obtain the first target federal learning model and obtain the corresponding first basic model performance degree.
Step S20, generating countermeasure data of the first data;
it should be noted that, there is a technical problem that the calculation of the contribution of the data is inaccurate due to unbalanced data size, unbalanced data distribution, or due to the existence of accidental factors such as malicious data, but in the present embodiment, the data contribution is evaluated based on the countermeasure data and the first data, that is, after the contribution evaluation is performed from the positive/negative aspects, a comprehensive evaluation is obtained, and the inaccuracy of the contribution evaluation caused by the accidental factors is reduced.
Compared with the method of determining the contribution degree after only the first data is removed, the method of generating the countermeasure data of the first data and then determining the contribution degree can more accurately determine the contribution degree of the participant data, because the countermeasure data can be added with the evaluation in the opposite direction, and has more depth in reducing the effect of accidental factors on the contribution degree evaluation caused by the accidental factors.
Referring to fig. 2, the step of generating countermeasure data of the first data includes:
step S21, obtaining a first label content of the first data, and replacing the first label content with other label content;
in this embodiment, the specific process of obtaining countermeasure data of first data is specifically to obtain first tag content of the first data, and replace the first tag content with other tag content, where the first data includes a plurality of pieces of data, and therefore, based on the first tag content of each piece of data in the first data, replace the first tag content with other tag content, and the other tag content may be determined or random. In this embodiment, in the process of replacing the first tag content with another tag content, the first tag content may be replaced with another tag content automatically, when the number of the first data is greater than a preset value, and after the first tag content of the first data is acquired, the first tag content is replaced with another tag content automatically, a first preset program segment needs to be preset in a processor, where the first preset program segment represents processing logic of an event of acquiring the first tag content of the first data, and the processing logic is configured to, after the event of acquiring the first tag content of the first data is detected, respond to the event of acquiring the first tag content of the first data to replace the first tag content with another tag content.
Step S22, combining the other tag content with other data than the first tag content of the first data to generate countermeasure data.
Combining the other tag content with other data outside the first tag content of the first data to generate countermeasure data, and automatically combining the other tag content with other data outside the first tag content of the first data to generate countermeasure data, a second preset program segment is required to be preset in the processor, the second preset program segment represents processing logic of an event for replacing the first tag content with other tag content, and the processing logic is used for responding to the event for replacing the first tag content with other tag content after detecting the event for replacing the first tag content with other tag content, so as to combine the other tag content with other data outside the first tag content of the first data to generate countermeasure data.
Specifically, the countermeasure data may be generated by combining the other tag contents with other data than the first tag contents of the first data by means of automatic field splicing.
The first tag content comprises classification tag content, numerical value output tag content and picture output tag content, and the step of combining the other tag content with other data except the first tag content of the first data to generate countermeasure data comprises the following steps:
step S221, determining the task type of the first target federal learning model;
in this embodiment, the target federated learning model has different types and different ways of generating countermeasure data, and therefore, the task type of the first target federated learning model is determined first, and the task type of the first target federated learning model includes, but is not limited to, a classification task, a regression task, a modeling task, and the like.
Step S222, if the task type is a classification identification task, combining the other label contents with other data except the classification label contents of the first data to generate countermeasure data;
if the task type is a classification identification task, combining the other label contents with other data except the classification label contents of the first data to generate countermeasure data, and if the label contents are cats, the other label contents can be dogs, or cattle, sheep and the like. That is, in the present embodiment, the other tag contents may be random.
Step S223, or if the task type is a regression task, combining the other tag contents with other data except the numerical output content of the first data to generate countermeasure data;
if the task type is a regression task, combining the other tag contents with other data except the numerical output content of the first data to generate countermeasure data, for example, if the numerical output content is 1, the other tag contents may be-1, and if the numerical output content is 2, the other tag contents may be 3, that is, in this embodiment, the other tag contents may be random or reverse.
Step S224, or if the task type is the picture output task, combining the other tag contents with other data except the picture output tag contents of the first data to generate countermeasure data.
If the task type is a picture output task, combining the other tag contents with other data except the picture output tag contents of the first data to generate countermeasure data, where the other tag contents may be pictures with different resolutions, and in this embodiment, the other tag contents may be random.
That is, whether to set random other tag contents or to generate opposite tag contents is determined with respect to whether the classification tag contents of the first data are quantifiable.
Step S30, determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree;
in this embodiment, the countermeasure data and the second data are combined to determine a second target federal learning model, so as to obtain a corresponding first countermeasure model performance degree, specifically, a first participant (or a coordinator) sends an initial model (the same as above) of a first learning task to each second participant, each participant trains the initial model based on the initial model and training data of each participant (the first participant is based on the countermeasure data, and the second participant is based on the second data), after training, a preset federal flow is executed by the first participant (or the coordinator), specifically, after each participant combines joint model parameters obtained by model parameters of each participant based on the first participant (or the coordinator), training is continued until reaching a preset end condition, so as to obtain a second target federal learning model (located in the first participant), and the first participant (or the coordinator) obtains the corresponding first countermeasure model performance degree based on the second target federal learning model.
It should be noted that, if the first basic model performance degree is obtained through the block chain, the first antagonistic model performance degree is also obtained through the block chain.
Step S40, evaluating the contribution of the first data according to the first basic model performance and the first countermeasure model performance.
And evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree, wherein the contribution degree comprises an absolute contribution degree and a relative contribution degree.
If the contribution degree is an absolute contribution degree, taking a difference between the first base model performance degree and the first countermeasure model performance degree as the contribution degree of the first data, for example, taking the difference between the first base model performance degree and the first countermeasure model performance degree as the accurate contribution degree of the first data.
If the contribution degree is a relative contribution degree, obtaining a difference value between the first basic model performance degree and the first countermeasure model performance degree, further obtaining a difference value between second basic model performance degrees of other second participants and the second countermeasure model performance degree, and further determining a relative relationship between the difference values to evaluate the contribution degree of the first data.
The method comprises the steps that after a first basic model performance degree of a first target federal learning model is determined by combining first data to be detected of a first participant and second data of other second participants, countermeasure data of the first data are generated; and then determining a first countermeasure model performance degree of a second target federal learning model by combining the countermeasure data and the second data, evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree from the positive and negative evaluation angles, further improving the accuracy of calculation of the contribution degree of the first data, avoiding the influence caused by accidental factors, and solving the technical problem that the contribution degree of each participant on the improvement of the model performance cannot be accurately calculated in the process of improving the model performance by using the participant in the prior art.
Further, according to the first embodiment of the present application, the countermeasure data includes a plurality of sets;
the step of determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree includes:
step A1, combining one group of confrontation data in multiple groups and the second data each time to determine multiple second target federal learning models so as to obtain multiple corresponding target model performance degrees;
in order to accurately evaluate the contribution degree of the first data, a plurality of groups of countermeasure data are also generated, other tag contents of the groups of countermeasure data are different, so as to generate different countermeasure data, and the classified tag contents which can be quantified at least comprise different countermeasure data respectively composed of random other tag contents and opposite tag contents.
And combining a group of countermeasure data in the multiple groups and the second data to determine multiple second target federal learning models to obtain multiple corresponding target model performance degrees, specifically, combining group a countermeasure data and the second data to determine one second target federal learning model, combining group b countermeasure data and the second data to determine one second target federal learning model, and obtaining multiple corresponding target model performance degrees after determining multiple second target federal learning models.
A2, carrying out mean value processing on the performance degrees of the target models to obtain a mean value model performance degree;
step A3, setting the mean model performance degree as the first countermeasure model performance degree.
In this embodiment, the performance degrees of the plurality of target models are subjected to mean processing to obtain a mean model performance degree, the mean model performance degree is set as the first countermeasure model performance degree, and since a plurality of sets of countermeasure data are generated and an average model performance difference is calculated, the influence of accidental factors is reduced.
In this embodiment, a plurality of second target federal learning models are determined by combining one set of confrontation data in a plurality of sets and the second data each time, so as to obtain corresponding performance degrees of a plurality of target models; carrying out average processing on the performance degrees of the plurality of target models to obtain an average model performance degree; and setting the average value model performance degree as the first countermeasure model performance degree. Thus, the accuracy of obtaining the first degree of impedance model performance is improved.
Further, based on the first and second embodiments of the present application, the step of evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree includes:
step B1, obtaining a second average model performance difference value, wherein the second average model performance difference value is determined by the second basic model performance degrees and the second countermeasure model performance degrees of other second participants;
in this embodiment, the second basic model performance degrees and the second countermeasure model performance degrees of the other second participants are obtained in the same obtaining manner as the first basic model performance degrees and the first countermeasure model performance degrees, and after obtaining, the second average model performance difference is obtained in the same obtaining manner as the first average model performance difference, and the second average model performance difference is determined by the second basic model performance degrees and the second countermeasure model performance degrees of the other second participants.
And step B2, evaluating the contribution degree of the first data according to the first basic model performance degree, the first countermeasure model performance degree and the second average model performance difference value.
Determining a first average model performance difference value according to the first basic model performance degree and the first countermeasure model performance degree, evaluating the contribution degree of the first data according to the first average model performance difference value and the second average model performance difference value,
the step of evaluating the contribution of the first data according to the first basic model performance degree, the first countermeasure model performance degree, and the second average model performance difference includes:
step C1, determining a first average model performance difference value according to the first basic model performance degree and the first antagonistic model performance degree;
step C2, adding and processing the first average model performance difference value and the plurality of second average model performance difference values to obtain an added and average model performance difference value;
in this embodiment, the number of the second participating parties may be multiple, so that the number of the second average model performance difference values may be multiple, and after the first average model performance difference value is obtained, the first average model performance difference value and the multiple second average model performance difference values are added and processed to obtain an added and average model performance difference value.
Step C3, determining a relative model performance difference for the first party based on the summed average model performance difference and the first average model performance difference;
and step C4, setting the relative model performance difference as the contribution degree of the first data.
And determining a relative model performance difference of the first participant based on the sum-average model performance difference and the first average model performance difference, specifically, dividing the first average model performance difference by the sum-average model performance difference to obtain the relative model performance difference of the first participant, and setting the relative model performance difference as the contribution of the first data.
In the embodiment, a second average model performance difference value is obtained, and the second average model performance difference value is determined by second basic model performance degrees and second reactance model performance degrees of other second participants; and accurately evaluating the contribution degree of the first data according to the first basic model performance degree, the first countermeasure model performance degree and the second average model performance difference value.
In another embodiment of the data contribution degree evaluation method, the data contribution degree evaluation method is applied to a third party, and the third party is in communication connection with the first party and the other second parties respectively, and the data contribution degree evaluation method includes:
step D1, acquiring second basic model performance degrees and second countermeasure model performance degrees of other second participants, and receiving a first basic model performance degree and a first countermeasure model performance degree of a first participant;
step D2, determining a relative model performance difference of a first participant according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree and the second countermeasure model performance degree so as to evaluate the contribution degree of the first data;
and D3, sending the contribution degree of the first data to the first participant.
In this embodiment, the third party is in communication connection with the first participant and the other second participants, and further, the third party may receive the second basic model performance degree and the second countermeasure model performance degree of the other second participants, and the third party may also receive the first basic model performance degree and the first countermeasure model performance degree of the first participant, and further determine the relative model performance difference of the first participant at the third party according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree, and the second countermeasure model performance degree, so as to evaluate the contribution degree of the first data. In this embodiment, the third party performs the evaluation of the contribution degree of the first data, and the objectivity of the evaluation is ensured.
In this embodiment, the first basic model performance degree and the first countermeasure model performance degree of the first participant are received by obtaining the second basic model performance degree and the second countermeasure model performance degree of the other second participants; determining a relative model performance difference value of a first participant according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree and the second countermeasure model performance degree so as to evaluate the contribution degree of the first data; and sending the contribution degree of the first data to the first participant. And accurately evaluating the contribution degree of the first data is realized.
Referring to fig. 3, fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
As shown in fig. 3, the data contribution degree evaluation device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the data contribution degree evaluation device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
Those skilled in the art will appreciate that the data contribution degree evaluation apparatus configuration shown in fig. 3 does not constitute a limitation of the data contribution degree evaluation apparatus, and may include more or less components than those shown, or combine some components, or a different arrangement of components.
As shown in fig. 3, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, and a data contribution degree evaluation program. The operating system is a program that manages and controls the hardware and software resources of the data contribution degree evaluation device, and supports the execution of the data contribution degree evaluation program as well as other software and/or programs. The network communication module is used for communication among the components in the memory 1005 and with other hardware and software in the data contribution evaluation system.
In the data contribution degree evaluation apparatus shown in fig. 3, the processor 1001 is configured to execute a data contribution degree evaluation program stored in the memory 1005, and implement the steps of the data contribution degree evaluation method described in any one of the above.
The specific implementation of the data contribution evaluation device of the present application is substantially the same as the embodiments of the data contribution evaluation method, and is not described herein again.
The present application further provides a data contribution degree evaluation device, including a first party and a second party, the first party and the second party are connected in a federal manner, the data contribution degree evaluation device includes:
the first determining module is used for determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants so as to obtain corresponding first basic model performance;
a generation module for generating countermeasure data of the first data;
the second determining module is used for determining a second target federal learning model by combining the countermeasure data and the second data so as to obtain the corresponding first countermeasure model performance degree;
and the evaluation module is used for evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree.
Optionally, the generating module includes:
a first obtaining unit, configured to obtain a first tag content of the first data, and replace the first tag content with another tag content;
a generating unit, configured to combine the other tag content with other data other than the first tag content of the first data to generate countermeasure data.
Optionally, the countermeasure data comprises a plurality of sets;
the second determining module includes:
the determining unit is used for combining one group of countermeasure data in the multiple groups and the second data at a time to determine a plurality of second target federal learning models so as to obtain corresponding performance degrees of the plurality of target models;
the mean value processing unit is used for carrying out mean value processing on the performance degrees of the plurality of target models to obtain the performance degree of the mean value model;
and the setting unit is used for setting the mean value model performance degree as the first countermeasure model performance degree.
Optionally, the evaluation module comprises:
the second obtaining unit is used for obtaining a second average model performance difference value, and the second average model performance difference value is determined by the second basic model performance degrees and the second countermeasure model performance degrees of other second participants;
and the evaluation unit is used for evaluating the contribution degree of the first data according to the first basic model performance degree, the first countermeasure model performance degree and the second average model performance difference value.
Optionally, the evaluation unit comprises:
the first determining subunit is configured to determine a first average model performance difference according to the first basic model performance degree and the first countermeasure model performance degree;
the adding subunit is used for adding and processing the first average model performance difference value and the plurality of second average model performance difference values to obtain an added average model performance difference value;
a second determining subunit, configured to determine a relative model performance difference of the first participant based on the summed average model performance difference and the first average model performance difference;
and the setting subunit is used for setting the relative model performance difference as the contribution degree of the first data.
Optionally, the generating unit includes:
a third determining subunit, configured to determine a task type of the first target federated learning model;
the generation subunit is used for combining the other label contents with other data except the classification label contents of the first data to generate countermeasure data if the task type is a classification identification task;
or if the task type is a regression task, combining the other label contents with other data except the numerical output content of the first data to generate countermeasure data;
or if the task type is a picture output task, combining the other tag contents with other data except the picture output tag contents of the first data to generate countermeasure data.
In order to achieve the above object, the present application provides a data contribution degree evaluation device, which is applied to a third party, where the third party is in communication connection with a first participant and other second participants, respectively, and the data contribution degree evaluation device includes:
the receiving module is used for receiving the second basic model performance degrees and the second countermeasure model performance degrees of other second participants and receiving the first basic model performance degree and the first countermeasure model performance degree of the first participant;
a third determining module, configured to determine a relative model performance difference of a first participant according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree, and the second countermeasure model performance degree, so as to evaluate a contribution degree of the first data;
a sending module, configured to send the contribution degree of the first data to the first party.
The specific implementation of the data contribution evaluation apparatus of the present application is substantially the same as the embodiments of the data contribution evaluation method, and is not described herein again.
The present application provides a storage medium, and the storage medium stores one or more programs, which can be further executed by one or more processors for implementing the steps of the data contribution degree evaluation method described in any one of the above.
The specific implementation of the storage medium of the present application is substantially the same as the embodiments of the data contribution evaluation method, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A data contribution degree evaluation method is characterized by comprising a first participant and a second participant, wherein the first participant is in federal connection with other second participants, and the data contribution degree evaluation method comprises the following steps:
determining a first target federal learning model by combining first data to be detected of a first participant and second data of other second participants to obtain a corresponding first basic model performance degree;
generating countermeasure data for the first data;
determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree;
and evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree.
2. The data contribution evaluation method of claim 1, wherein the step of generating countermeasure data for the first data comprises:
acquiring first label content of the first data, and replacing the first label content with other label content;
combining the other tag content with other data than the first tag content of the first data to generate countermeasure data.
3. The data contribution evaluation method of claim 2, wherein the countermeasure data includes a plurality of sets;
the step of determining a second target federal learning model by combining the countermeasure data and the second data to obtain a corresponding first countermeasure model performance degree includes:
combining one group of countermeasure data in the multiple groups and the second data each time to determine multiple second target federal learning models so as to obtain corresponding multiple target model performance degrees;
carrying out average processing on the performance degrees of the plurality of target models to obtain an average model performance degree;
and setting the average value model performance degree as the first countermeasure model performance degree.
4. The data contribution evaluation method according to claim 1, wherein the step of evaluating the contribution of the first data based on the first base model performance and the first countermeasure model performance comprises:
obtaining a second average model performance difference value, wherein the second average model performance difference value is determined by the second basic model performance degrees and the second reactance model performance degrees of other second participants;
and evaluating the contribution degree of the first data according to the first basic model performance degree, the first countermeasure model performance degree and the second average model performance difference value.
5. The data contribution evaluation method of claim 4, wherein the second average model performance difference is plural, and the step of evaluating the contribution of the first data based on the first base model performance, the first countermeasure model performance, and the second average model performance difference comprises:
determining a first average model performance difference value according to the first basic model performance degree and the first countermeasure model performance degree;
adding and processing the first average model performance difference value and the plurality of second average model performance difference values to obtain an added and average model performance difference value;
determining a relative model performance difference for the first participant based on the summed average model performance difference and the first average model performance difference;
and setting the relative model performance difference as the contribution degree of the first data.
6. The data contribution evaluation method of claim 2, wherein the first tag content comprises a classification tag content, a numerical output tag content, and a picture output tag content, and the step of combining the other tag content with other data than the first tag content of the first data to generate countermeasure data comprises:
determining a task type of the first target federated learning model;
if the task type is a classification identification task, combining the other label contents with other data except the classification label contents of the first data to generate countermeasure data;
or if the task type is a regression task, combining the other label contents with other data except the numerical output content of the first data to generate countermeasure data;
or if the task type is a picture output task, combining the other tag contents with other data except the picture output tag contents of the first data to generate countermeasure data.
7. A data contribution degree evaluation method is applied to a third party which is in communication connection with a first participant and other second participants respectively, and comprises the following steps:
acquiring second basic model performance degrees and second countermeasure model performance degrees of other second participants, and receiving a first basic model performance degree and a first countermeasure model performance degree of a first participant;
determining a relative model performance difference value of a first participant according to the first basic model performance degree, the first countermeasure model performance degree, the second basic model performance degree and the second countermeasure model performance degree so as to evaluate the contribution degree of the first data;
and sending the contribution degree of the first data to the first participant.
8. A data contribution evaluation device, wherein a first party and a second party are federately connected, the data contribution evaluation device comprising:
the first combination module is used for combining first data to be detected of a first participant and second data of other second participants to determine a first target federal learning model so as to obtain a corresponding first basic model performance degree;
a generation module for generating countermeasure data of the first data;
the second combination module is used for combining the countermeasure data and the second data to determine a second target federal learning model so as to obtain a corresponding first countermeasure model performance degree;
and the evaluation module is used for evaluating the contribution degree of the first data according to the first basic model performance degree and the first countermeasure model performance degree.
9. A data contribution degree evaluation device characterized by comprising: a memory, a processor, and a program stored on the memory for implementing the data contribution degree evaluation method,
the memory is used for storing a program for realizing the data contribution degree evaluation method;
the processor is configured to execute a program implementing the data contribution degree evaluation method to implement the steps of the data contribution degree evaluation method according to any one of claims 1 to 7.
10. A storage medium having stored thereon a program for implementing a data contribution degree evaluation method, the program being executed by a processor to implement the steps of the data contribution degree evaluation method according to any one of claims 1 to 7.
CN202010504333.3A 2020-06-04 2020-06-04 Data contribution degree evaluation method, device, equipment and storage medium Pending CN111652383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010504333.3A CN111652383A (en) 2020-06-04 2020-06-04 Data contribution degree evaluation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010504333.3A CN111652383A (en) 2020-06-04 2020-06-04 Data contribution degree evaluation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111652383A true CN111652383A (en) 2020-09-11

Family

ID=72348990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010504333.3A Pending CN111652383A (en) 2020-06-04 2020-06-04 Data contribution degree evaluation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111652383A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132198A (en) * 2020-09-16 2020-12-25 建信金融科技有限责任公司 Data processing method, device and system and server
CN112651170A (en) * 2020-12-14 2021-04-13 德清阿尔法创新研究院 Efficient feature contribution evaluation method in longitudinal federated learning scene
CN113902134A (en) * 2021-09-29 2022-01-07 光大科技有限公司 Contribution evaluation processing method and device
WO2022057108A1 (en) * 2020-09-17 2022-03-24 南京博雅区块链研究院有限公司 Federated-learning-based personal qualification evaluation method, apparatus and system, and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132198A (en) * 2020-09-16 2020-12-25 建信金融科技有限责任公司 Data processing method, device and system and server
CN112132198B (en) * 2020-09-16 2021-06-04 建信金融科技有限责任公司 Data processing method, device and system and server
WO2022057108A1 (en) * 2020-09-17 2022-03-24 南京博雅区块链研究院有限公司 Federated-learning-based personal qualification evaluation method, apparatus and system, and storage medium
CN112651170A (en) * 2020-12-14 2021-04-13 德清阿尔法创新研究院 Efficient feature contribution evaluation method in longitudinal federated learning scene
CN112651170B (en) * 2020-12-14 2024-02-27 德清阿尔法创新研究院 Efficient characteristic contribution assessment method in longitudinal federal learning scene
CN113902134A (en) * 2021-09-29 2022-01-07 光大科技有限公司 Contribution evaluation processing method and device

Similar Documents

Publication Publication Date Title
CN111652383A (en) Data contribution degree evaluation method, device, equipment and storage medium
EP3893154A1 (en) Recommendation model training method and related apparatus
CN108762907B (en) Task processing method and system based on multiple clients
CN105095919A (en) Image recognition method and image recognition device
KR20200135892A (en) Method, apparatus and computer program for providing personalized educational curriculum and contents through user learning ability
TW201928709A (en) Method and apparatus for merging model prediction values, and device
US20150178134A1 (en) Hybrid Crowdsourcing Platform
CN110740356B (en) Live broadcast data monitoring method and system based on block chain
CN111222647A (en) Federal learning system optimization method, device, equipment and storage medium
CN107807841B (en) Server simulation method, device, equipment and readable storage medium
CN115065652B (en) Message reply method and device, storage medium and computer equipment
CN111160624B (en) User intention prediction method, user intention prediction device and terminal equipment
CN111738463A (en) Operation and maintenance method, device, system, electronic equipment and storage medium
CN111104988A (en) Image recognition method and related device
CN111523679A (en) Feature binning method, device and readable storage medium
CN108805332B (en) Feature evaluation method and device
CN116187754A (en) Production line fault positioning method, equipment and readable storage medium
CN112581001B (en) Evaluation method and device of equipment, electronic equipment and readable storage medium
CN111427900B (en) Label library updating method, device, equipment and readable storage medium
US11315238B2 (en) Method for manufacturing a product
CN111967788A (en) Target enterprise determination method and device, first electronic equipment and storage medium
CN113407521B (en) User behavior tag preference sorting method, device, equipment and storage medium
CN116109080B (en) Building integrated management platform based on BIM and AR
CN114637868B (en) Product data processing method and system applied to fast-moving industry
CN113706040B (en) Risk identification method, apparatus, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination