CN110162995B - Method and device for evaluating data contribution degree - Google Patents

Method and device for evaluating data contribution degree Download PDF

Info

Publication number
CN110162995B
CN110162995B CN201910323738.4A CN201910323738A CN110162995B CN 110162995 B CN110162995 B CN 110162995B CN 201910323738 A CN201910323738 A CN 201910323738A CN 110162995 B CN110162995 B CN 110162995B
Authority
CN
China
Prior art keywords
party
model
data
training
evaluation result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910323738.4A
Other languages
Chinese (zh)
Other versions
CN110162995A (en
Inventor
陈超超
周俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910323738.4A priority Critical patent/CN110162995B/en
Publication of CN110162995A publication Critical patent/CN110162995A/en
Application granted granted Critical
Publication of CN110162995B publication Critical patent/CN110162995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The application relates to the field of data sharing, and discloses a method and a device for evaluating data contribution degree. The method is performed by a first party and comprises: performing model training by using training data of a first party to obtain a first model; using self training data of a first party, and carrying out model training together with a second party based on a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training based on the multi-party safety calculation mode and the first party; respectively obtaining an evaluation result of the first model and an evaluation result of the second model by using the test data of the first party; and evaluating the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.

Description

Method and device for evaluating data contribution degree
Technical Field
The present application relates to the field of data sharing.
Background
Data sharing becomes the next arduous research and practicability problem, and refers to that a plurality of data parties jointly perform data mining or machine learning work under the condition of protecting respective data privacy so as to dig out greater value in data. Fig. 1 is a schematic diagram of a data sharing principle.
For example, there are three banks, each with a bank of credit investigation data for users, who want to jointly train a better credit investigation model for crediting the users. One practical problem that all parties can consider at this time is: the other party will not spoof himself with false data or low quality data. That is, when data is shared, the contribution degree of each party's data needs to be evaluated.
In the prior art, the following disadvantages exist in evaluating the contribution degree of each data in data sharing:
(1) The contribution degree of each party of data can be judged only by mixing the plaintext data of each party;
(2) The privacy of the data of the parties cannot be protected.
Disclosure of Invention
The present specification provides a method and an apparatus for evaluating a data contribution degree, which can evaluate the contribution degree of each party's data to a final service on the premise of protecting the privacy of each party's data.
To solve the above technical problem, an embodiment of the present specification discloses a method of evaluating a degree of contribution of data, the method performed by a first party, including:
performing model training by using training data of a first party to obtain a first model;
using self training data of a first party, and carrying out model training together with a second party based on a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training based on the multi-party safety calculation mode and the first party;
respectively obtaining the evaluation result of the first model and the evaluation result of the second model by using the self test data of the first party;
and evaluating the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.
Embodiments of the present specification also disclose an apparatus for evaluating a degree of data contribution, the apparatus for use with a first party, comprising:
the first training module is used for carrying out model training by using the training data of the first party to obtain a first model;
the second training module is used for using the self training data of the first party and carrying out model training together with the second party on the basis of a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training on the basis of the multi-party safety calculation mode and the first party;
the first testing module is used for respectively obtaining the evaluation result of the first model and the evaluation result of the second model by using the testing data of the first party;
and the first evaluation module is used for evaluating the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.
The embodiment of the present specification also discloses an apparatus for evaluating a data contribution degree, including:
a memory for storing computer executable instructions; and (c) a second step of,
a processor for implementing the steps of the above method when executing the computer executable instructions.
Embodiments of the present specification also disclose a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement the steps of the above-described method.
In the embodiment of the specification, plaintext data of each party does not need to be mixed, and the contribution degree of each party of data to the final service can be evaluated on the premise of protecting the data privacy of each party.
A large number of technical features are described in the specification of the present application, and are distributed in various technical solutions, so that the specification is too long if all possible combinations of the technical features (i.e., the technical solutions) in the present application are listed. In order to avoid this problem, the respective technical features disclosed in the above-mentioned summary of the invention of the present application, the respective technical features disclosed in the following embodiments and examples, and the respective technical features disclosed in the drawings may be freely combined with each other to constitute various new technical solutions (all of which are considered to have been described in the present specification) unless such a combination of the technical features is technically impossible. For example, in one example, the feature a + B + C is disclosed, in another example, the feature a + B + D + E is disclosed, and the features C and D are equivalent technical means for the same purpose, and technically only one feature is used, but not simultaneously employed, and the feature E can be technically combined with the feature C, then the solution of a + B + C + D should not be considered as being described because the technology is not feasible, and the solution of a + B + C + E should be considered as being described.
Drawings
FIG. 1 is a schematic illustration of a data sharing concept;
FIG. 2 is a flow chart illustrating a method for evaluating a degree of data contribution according to a first embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus for evaluating a data contribution degree according to a third embodiment of the present application.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application may be implemented without these technical details and with various changes and modifications based on the following embodiments.
Description of partial concepts:
and (3) secure data sharing: the method refers to a plurality of data parties which carry out data mining or machine learning work together under the condition of protecting the privacy of the respective data.
To make the objects, technical solutions and advantages of the present specification clearer, embodiments of the present specification will be described in further detail below with reference to the accompanying drawings.
A first embodiment of the present specification relates to a method for evaluating a data contribution degree, and a flow chart thereof is schematically shown in fig. 2.
First, it should be noted that the method is executed by the first party, that is, the method is a method for the first party to evaluate the data contribution degree of other parties.
As shown in fig. 2, the method for evaluating the degree of data contribution includes the following steps:
in step 201, model training is performed using the training data of the first party itself to obtain a first model.
That is, in step 201, a first party trains a model using its own training data to obtain a first model.
Then step 203 is entered, the training data of the first party is used, and model training is carried out together with the second party based on the multi-party security computing way, so as to obtain a second model, wherein the second party provides self data in the process of model training based on the multi-party security computing way and the first party.
That is, in step 203, the first party uses its own training data, and the second party uses its own data to obtain the second model by using a modeling method of data sharing (multi-party secure computation).
In this embodiment, preferably, the model is a logistic regression model. Further, a neural network model or a tree model, etc. may be used.
Multi-party security computing is a collaborative computing problem that addresses privacy protection among a group of mutually untrusted participants, such as co-training logistic regression models. The multi-party security calculation needs to ensure the independence of input and the correctness of calculation, and meanwhile, each input value is not leaked to other participants participating in the calculation. And after the calculation is completed, the results are given to the various participants.
The ways of multi-party secure computation can be mainly divided into three categories:
1. an obfuscation circuit;
2. homomorphic encryption;
3. and (4) secret sharing.
For example, a common logistic regression model, the three ways can be realized, and each has advantages and disadvantages. That is, in the present embodiment, the ways of multi-party secure computation may include the above three ways.
Secret sharing is a cryptographic technique for storing a secret in a split manner, and divides the secret into a plurality of secret shares in a proper manner, each secret share is owned and managed by one of a plurality of parties, a single party cannot recover the complete secret, and only a plurality of parties cooperate together can the complete secret be recovered. Secret sharing aims to prevent the secret from being too centralized so as to achieve the purposes of dispersing risks and tolerating intrusion.
Secret sharing can be roughly divided into two categories: there is trusted initializer secret sharing and untrusted initializer secret sharing. In secret sharing with a trusted initiator, the trusted initiator is required to perform parameter initialization (often to generate random numbers meeting certain conditions) on each participant participating in multi-party secure computation. After the initialization is completed, the trusted initialization party destroys the data and disappears at the same time, and the data are not needed in the following multi-party security calculation process.
Secret-sharing matrix multiplication with a trusted initiator is applicable to the following cases: the complete secret data is a product of the first set of secret shares and the second set of secret shares, and each participant has a first one of the first set of secret shares and a second one of the second set of secret shares. By the secret sharing matrix multiplication of the trusted initiator, each of the multiple participants can obtain partial complete secret data of the complete secret data, the sum of the partial complete secret data obtained by each participant is the complete secret data, and each participant discloses the obtained partial complete secret data to the rest of the participants, so that each participant can obtain the complete secret data without disclosing the secret share owned by each participant, thereby ensuring the safety of the data of each of the multiple participants.
In addition, model training based on a multi-party secure computing mode can also use a trusted zone in service equipment as an execution environment isolated from the outside, encrypted data is decrypted in the trusted zone to obtain user data, and a user data training model is adopted in the trusted zone, so that the user data is not exposed outside the trusted zone all the time in the whole model training process, and the user privacy is protected.
Of course, the above illustrates only two implementations of multi-party secure computing. Those skilled in the art will appreciate that multi-party security computing is well established in the art and will not be described in detail herein.
It should be noted that, the execution sequence of step 201 and step 203 is not sequential, and step 201 may be executed first, and then step 203 may be executed; step 203 may be executed first, and then step 201 may be executed; step 201 and step 203 may also be performed simultaneously.
Then, step 205 is performed to obtain the evaluation result of the first model and the evaluation result of the second model respectively using the test data of the first party.
That is, in step 205, the first model and the second model respectively obtain their respective evaluation results on the test data of the first party.
With regard to how the evaluation results of the first model and the second model are obtained, there are different evaluation criteria for different service scenarios:
for example, for an advertisement click-through rate model, evaluation is typically performed by an AUC measure; for credit wind control business, evaluation is typically done by KS index; for the field of electronic commerce, evaluation is generally performed using GMV index, and the like.
Then, step 207 is performed to evaluate the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.
That is, the degree of the effect of the second model on the test data of the first party is increased relative to the effect of the first model on the test data of the first party, i.e. the contribution degree of the data of the second party.
For example, assuming that the accuracy of the first model is 90% and the accuracy of the second model is 91% through testing, the accuracy of the second model is improved by 1% relative to the accuracy of the first model, and the improved accuracy of 1% reflects the contribution degree of the second party data.
This flow ends thereafter.
In summary, in the above embodiments of the present specification, two models are trained using different data, and evaluation results of the two models are compared, so that the contribution degree of each party of data to the final service can be evaluated on the premise of protecting privacy of each party of data.
A second embodiment of the present specification relates to a method of evaluating a degree of contribution of data. The second embodiment is substantially the same as the first embodiment except that: in the first embodiment, the sharing of the added data is performed by the first party and the second party; and the second embodiment participates in data sharing by more than three parties (including three parties).
In the case of multi-party data sharing, the data of multiple parties can be added in one party, that is, the data of one party is added in more than one time, and the contribution degree of each party is evaluated according to the method in the first embodiment.
The following examples are given:
if a third party is involved in data sharing, that is, the method for evaluating the contribution degree of data further includes evaluating the contribution degree of data of the third party, according to the method of the first embodiment, the first party data and the second party data are modeled, and then the third party data is added to model, and then comparison is performed.
Specifically, when the method further includes evaluating the degree of contribution of the third-party data, the method of evaluating the degree of contribution of the data includes the steps of:
performing model training by using training data of a first party to obtain a first model;
using self training data of a first party, and carrying out model training together with a second party based on a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training based on the multi-party safety calculation mode and the first party;
respectively obtaining the evaluation result of the first model and the evaluation result of the second model by using the self test data of the first party;
evaluating the contribution degree of second-party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model;
using training data of a first party, and performing model training together with a second party and a third party based on a multi-party security calculation mode to obtain a third model, wherein the second party and the third party provide data of the second party and the third party in the process of performing model training based on the multi-party security calculation mode and the first party;
obtaining an evaluation result of the third model by using the test data of the first party;
and evaluating the contribution degree of the third-party data according to the promotion degree of the evaluation result of the third model relative to the evaluation result of the second model.
For example, if the accuracy of the first model is 90%, the accuracy of the second model is 91%, and the accuracy of the third model is 93% through testing, the accuracy of the second model is improved by 1% relative to the accuracy of the first model, the accuracy of the third model is improved by 2% relative to the accuracy of the second model, the improved accuracy of the 1% of the second model reflects the contribution degree of the second-party data, and the improved accuracy of the 2% of the third model reflects the contribution degree of the third-party data.
And under the condition that the multiple parties are four parties, modeling is carried out by using training data, second party data, third party data and fourth party data of the first party, then the test data of the first party is used to respectively obtain the evaluation results of the models, and finally the evaluation results of the models are respectively compared, so that the contribution degree of the data of the parties is evaluated.
By analogy, the method for evaluating the contribution degree of the data can be used for the situations of five parties, six parties, seven parties, \8230, 8230and data sharing, and the contribution degree of each party of data to the final service can be evaluated on the premise of protecting the privacy of the data of each party.
The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the first embodiment.
A third embodiment of the present application relates to an apparatus for evaluating a degree of data contribution, and a schematic structural diagram thereof is shown in fig. 3.
First, it should be noted that the apparatus is used for the first party, that is, the apparatus is an apparatus used by the first party to evaluate the data contribution degree of other parties.
As shown in fig. 3, the apparatus for evaluating the degree of data contribution includes:
and the first training module is used for carrying out model training by using the training data of the first party to obtain a first model.
In this embodiment, preferably, the model is a logistic regression model. Further, a neural network model or a tree model, etc. may be used.
And the second training module is used for using the self training data of the first party and carrying out model training together with the second party on the basis of a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training on the basis of the multi-party safety calculation mode and the first party.
The first party data uses the training data of the first party, the second party data uses the data of the second party, and a modeling method of data sharing (multi-party safety calculation) is used to obtain a second model.
Multi-party security computing is a collaborative computing problem that solves privacy protection among a group of distrusted parties, for example, a logistic regression model is trained together. The multi-party security calculation needs to ensure the independence of input and the correctness of calculation, and meanwhile, each input value is not leaked to other participants participating in the calculation. After the calculation is completed, the results are given to each participant.
The secure computing modes of multiple parties can be mainly divided into three categories:
1. an obfuscation circuit;
2. carrying out homomorphic encryption;
3. and (4) secret sharing.
For example, a common logistic regression model, three methods can be realized, and each method has advantages and disadvantages. That is, in the present embodiment, the ways of multi-party secure computation include the above three ways.
The secret sharing is a cryptographic technology for storing a secret in a split manner, the secret is split into a plurality of secret shares in a proper manner, each secret share is owned and managed by one of a plurality of participants, a single participant cannot recover the complete secret, and the complete secret can be recovered only by cooperation of a plurality of participants. Secret sharing aims to prevent the secret from being too centralized so as to achieve the purposes of dispersing risks and tolerating intrusion.
Secret sharing can be broadly divided into two categories: there is trusted initializer secret sharing and untrusted initializer secret sharing. In secret sharing with a trusted initiator, the trusted initiator is required to perform parameter initialization (often to generate random numbers meeting certain conditions) on each participant participating in multi-party secure computing. After the initialization is completed, the trusted initialization party destroys the data and disappears at the same time, and the data are not needed in the following multi-party security calculation process.
The secret-shared matrix multiplication with the trusted initiator applies to the following cases: the complete secret data is a product of the first set of secret shares and the second set of secret shares, and each of the participants has one of the first set of secret shares and one of the second set of secret shares. By the secret sharing matrix multiplication of the trusted initiator, each of the multiple participants can obtain partial complete secret data of the complete secret data, the sum of the partial complete secret data obtained by each participant is the complete secret data, and each participant discloses the obtained partial complete secret data to the rest of the participants, so that each participant can obtain the complete secret data without disclosing the secret shares owned by each participant, and the safety of the data of each participant is ensured.
In addition, model training based on a multi-party secure computing mode can also use a trusted zone in service equipment as an execution environment isolated from the outside, encrypted data is decrypted in the trusted zone to obtain user data, and a user data training model is adopted in the trusted zone, so that the user data is not exposed outside the trusted zone all the time in the whole model training process, and the user privacy is protected.
Of course, the above illustrates only two implementations of multi-party secure computing. Those skilled in the art will appreciate that multi-party security computing is well known in the art and will not be described in detail herein.
And the first testing module is used for respectively obtaining the evaluation result of the first model and the evaluation result of the second model by using the testing data of the first party.
The first model and the second model respectively obtain respective evaluation results on the test data of the first party.
With regard to how the evaluation results of the first model and the second model are obtained, there are different evaluation criteria for different service scenarios:
for example, for an advertisement click-through rate model, evaluation is typically performed by an AUC measure; for credit wind control business, evaluation is typically done by KS index; for the field of electronic commerce, evaluation is generally performed using GMV index, and the like.
And the first evaluation module is used for evaluating the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.
That is, the degree of improvement of the effect of the second model on the test data of the first party relative to the effect of the first model on the test data of the first party is the contribution degree of the data of the second party.
For example, assuming that the accuracy of the first model is 90% and the accuracy of the second model is 91% through testing, the accuracy of the second model is improved by 1% relative to the accuracy of the first model, and the improved accuracy of 1% reflects the contribution degree of the second party data.
In summary, in the above embodiments of the present specification, two models are trained using different data, and evaluation results of the two models are compared, so that the contribution degree of each party of data to the final service can be evaluated on the premise of protecting privacy of each party of data.
The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the first embodiment.
A fourth embodiment of the present specification relates to an apparatus for evaluating a degree of contribution of data. The fourth embodiment is substantially the same as the third embodiment except that: in the first embodiment, the sharing of the added data is performed by the first party and the second party; and the second embodiment is concerned with data sharing by more than three parties (including three parties).
In the case of multi-party data, the multi-party data can be added in one party, i.e. each time one more party data is added, the device in the third embodiment is used to evaluate the contribution degree of each party data.
The following description will take three parties participating in data sharing as an example:
that is, the apparatus is further configured to evaluate the degree of contribution of the third-party data, in which case the apparatus for evaluating the degree of contribution of the data further includes:
and the third training module is used for using the training data of the first party and carrying out model training together with the second party and the third party based on a multi-party safety calculation mode to obtain a third model, wherein the second party and the third party provide the data of the second party and the third party in the process of carrying out model training based on the multi-party safety calculation mode and the first party.
And the second testing module is used for obtaining the evaluation result of the third model by using the testing data of the first party.
And the second evaluation module is used for evaluating the contribution degree of the third-party data according to the promotion degree of the evaluation result of the third model relative to the evaluation result of the second model.
For example, suppose in one case that the accuracy of the first model is 90%, the accuracy of the second model is 91%, and the accuracy of the third model is 93% through testing, the accuracy of the second model is improved by 1% relative to the accuracy of the first model, the accuracy of the third model is improved by 2% relative to the accuracy of the second model, the improved 1% accuracy of the second model reflects the contribution degree of the second-party data, and the improved 2% accuracy of the third model reflects the contribution degree of the third-party data.
And under the condition that the multiple parties are four parties, modeling is carried out by using training data, second party data, third party data and fourth party data of the first party, then the test data of the first party is used to respectively obtain the evaluation results of the models, and finally the evaluation results of the models are respectively compared, so that the contribution degree of the data of the parties is evaluated.
By analogy, the method for evaluating the data contribution degree can be used for five-party, six-party, seven-party, \8230, data sharing, and the contribution degree of each party of data to the final business can be evaluated on the premise of protecting the data privacy of each party.
The second embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the second embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the second embodiment.
It should be noted that, as will be understood by those skilled in the art, the implementation functions of the modules shown in the embodiment of the apparatus for evaluating the data contribution degree may be understood by referring to the related description of the method for evaluating the data contribution degree. The functions of the modules shown in the embodiments of the apparatus for evaluating the data contribution degree may be implemented by a program (executable instructions) running on a processor, or may be implemented by specific logic circuits. The above-mentioned means for evaluating the degree of data contribution in the embodiments of the present specification, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be substantially or partially embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present description are not limited to any specific combination of hardware and software.
Accordingly, the present specification embodiments also provide a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the method embodiments of the specification. Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
In addition, the present specification also provides an apparatus for evaluating a degree of data contribution, comprising a memory for storing computer-executable instructions, and a processor; the processor is configured to implement the steps of the method embodiments described above when executing the computer-executable instructions in the memory. The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. The aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a Flash memory (Flash), a hard disk, or a solid state disk. The steps of the method disclosed in the embodiments of the present invention may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. Multiple, etc. expressions include 2, 2 2 kinds and more than 2, more than 2 times and more than 2 kinds.
All documents mentioned in this specification are to be considered as being incorporated in their entirety into the disclosure of this specification so as to be subject to modification as necessary. It should be understood that the above description is only for the preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims (10)

1. A method of evaluating a degree of data contribution, the method performed by a first party, comprising:
performing model training by using training data of a first party to obtain a first model;
using self training data of a first party, and carrying out model training together with a second party based on a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training based on the multi-party safety calculation mode and the first party;
respectively obtaining the evaluation result of the first model and the evaluation result of the second model by using the self test data of the first party;
and evaluating the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.
2. The method of claim 1, wherein the method further comprises evaluating a degree of contribution of third party data, the method further comprising:
using training data of a first party, and performing model training together with a second party and a third party based on a multi-party safety calculation mode to obtain a third model, wherein the second party and the third party provide own data in the process of performing model training based on the multi-party safety calculation mode and the first party;
obtaining an evaluation result of the third model by using the test data of the first party;
and evaluating the contribution degree of the third-party data according to the promotion degree of the evaluation result of the third model relative to the evaluation result of the second model.
3. The method of claim 1, wherein the model comprises: a logistic regression model, a neural network model, or a tree model.
4. The method of claim 1 or 2, wherein the multi-party security computation comprises: garbled circuits, homomorphic encryption, or secret sharing.
5. An apparatus for evaluating a degree of data contribution, the apparatus for use with a first party, comprising:
the first training module is used for carrying out model training by using the training data of the first party to obtain a first model;
the second training module is used for using the self training data of the first party and carrying out model training together with the second party on the basis of a multi-party safety calculation mode to obtain a second model, wherein the second party provides self data in the process of carrying out model training on the basis of the multi-party safety calculation mode and the first party;
the first testing module is used for respectively obtaining the evaluation result of the first model and the evaluation result of the second model by using the testing data of the first party;
and the first evaluation module is used for evaluating the contribution degree of the second party data according to the promotion degree of the evaluation result of the second model relative to the evaluation result of the first model.
6. The apparatus of claim 5, wherein the apparatus is further configured to evaluate a degree of contribution of third party data, the apparatus further comprising:
the third training module is used for using the self training data of the first party and carrying out model training together with the second party and the third party based on a multi-party safety calculation mode to obtain a third model, wherein the second party and the third party provide self data in the process of carrying out model training based on the multi-party safety calculation mode and the first party;
the second testing module is used for obtaining the evaluation result of the third model by using the testing data of the first party;
and the second evaluation module is used for evaluating the contribution degree of the third-party data according to the promotion degree of the evaluation result of the third model relative to the evaluation result of the second model.
7. The apparatus of claim 5, wherein the model comprises: a logistic regression model, a neural network model, or a tree model.
8. The apparatus of claim 6 or 7, wherein the means for multi-party secure computation comprises: garbled circuits, homomorphic encryption, or secret sharing.
9. An apparatus for evaluating a degree of data contribution, comprising:
a memory for storing computer executable instructions; and the number of the first and second groups,
a processor for implementing the steps in the method of any one of claims 1-4 when executing the computer-executable instructions.
10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor implement the steps in the method of any one of claims 1-4.
CN201910323738.4A 2019-04-22 2019-04-22 Method and device for evaluating data contribution degree Active CN110162995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910323738.4A CN110162995B (en) 2019-04-22 2019-04-22 Method and device for evaluating data contribution degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910323738.4A CN110162995B (en) 2019-04-22 2019-04-22 Method and device for evaluating data contribution degree

Publications (2)

Publication Number Publication Date
CN110162995A CN110162995A (en) 2019-08-23
CN110162995B true CN110162995B (en) 2023-01-10

Family

ID=67639822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910323738.4A Active CN110162995B (en) 2019-04-22 2019-04-22 Method and device for evaluating data contribution degree

Country Status (1)

Country Link
CN (1) CN110162995B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851482B (en) * 2019-11-07 2022-02-18 支付宝(杭州)信息技术有限公司 Method and device for providing data model for multiple data parties
CN111061963B (en) * 2019-11-28 2021-05-11 支付宝(杭州)信息技术有限公司 Machine learning model training and predicting method and device based on multi-party safety calculation
CN112990260B (en) * 2021-02-05 2022-04-26 支付宝(杭州)信息技术有限公司 Model evaluation method and system based on multi-party security calculation

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633265A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 For optimizing the data processing method and device of credit evaluation model
CN107704930A (en) * 2017-09-25 2018-02-16 阿里巴巴集团控股有限公司 Modeling method, device, system and electronic equipment based on shared data
CN108038471A (en) * 2017-12-27 2018-05-15 哈尔滨工程大学 A kind of underwater sound communication signal type Identification method based on depth learning technology
CN108229555A (en) * 2017-12-29 2018-06-29 深圳云天励飞技术有限公司 Sample weights distribution method, model training method, electronic equipment and storage medium
CN108256693A (en) * 2018-02-11 2018-07-06 阳光电源股份有限公司 A kind of photovoltaic power generation power prediction method, apparatus and system
CN108734296A (en) * 2017-04-21 2018-11-02 北京京东尚科信息技术有限公司 Optimize method, apparatus, electronic equipment and the medium of the training data of supervised learning
CN109189921A (en) * 2018-08-07 2019-01-11 阿里巴巴集团控股有限公司 Comment on the training method and device of assessment models
CN109308418A (en) * 2017-07-28 2019-02-05 阿里巴巴集团控股有限公司 A kind of model training method and device based on shared data
CN109325584A (en) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 Federation's modeling method, equipment and readable storage medium storing program for executing neural network based
CN109522919A (en) * 2018-09-17 2019-03-26 深圳市佰仟金融服务有限公司 A kind of data assessment method and device
CN109559214A (en) * 2017-09-27 2019-04-02 阿里巴巴集团控股有限公司 Virtual resource allocation, model foundation, data predication method and device
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020871B2 (en) * 2010-06-18 2015-04-28 Microsoft Technology Licensing, Llc Automated classification pipeline tuning under mobile device resource constraints
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
EP3474201A1 (en) * 2017-10-17 2019-04-24 Tata Consultancy Services Limited System and method for quality evaluation of collaborative text inputs
US11556730B2 (en) * 2018-03-30 2023-01-17 Intel Corporation Methods and apparatus for distributed use of a machine learning model

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734296A (en) * 2017-04-21 2018-11-02 北京京东尚科信息技术有限公司 Optimize method, apparatus, electronic equipment and the medium of the training data of supervised learning
CN109308418A (en) * 2017-07-28 2019-02-05 阿里巴巴集团控股有限公司 A kind of model training method and device based on shared data
CN107633265A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 For optimizing the data processing method and device of credit evaluation model
CN107704930A (en) * 2017-09-25 2018-02-16 阿里巴巴集团控股有限公司 Modeling method, device, system and electronic equipment based on shared data
CN109559214A (en) * 2017-09-27 2019-04-02 阿里巴巴集团控股有限公司 Virtual resource allocation, model foundation, data predication method and device
CN108038471A (en) * 2017-12-27 2018-05-15 哈尔滨工程大学 A kind of underwater sound communication signal type Identification method based on depth learning technology
CN108229555A (en) * 2017-12-29 2018-06-29 深圳云天励飞技术有限公司 Sample weights distribution method, model training method, electronic equipment and storage medium
CN108256693A (en) * 2018-02-11 2018-07-06 阳光电源股份有限公司 A kind of photovoltaic power generation power prediction method, apparatus and system
CN109189921A (en) * 2018-08-07 2019-01-11 阿里巴巴集团控股有限公司 Comment on the training method and device of assessment models
CN109325584A (en) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 Federation's modeling method, equipment and readable storage medium storing program for executing neural network based
CN109522919A (en) * 2018-09-17 2019-03-26 深圳市佰仟金融服务有限公司 A kind of data assessment method and device
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Differentially Private Federated Learning: A Client Level Perspective;Geyer R C et al.;《arXiv preprint》;20180301;全文 *
Federated Learning with Non-iid Data;Zhao Y et al.;《arXiv preprint》;20180531;全文 *
Federated Optimization: Distributed Machine Learning for On-device Intelligence;Konecny J et al.;《arXiv preprint》;20161011;全文 *

Also Published As

Publication number Publication date
CN110162995A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
Byrd et al. Differentially private secure multi-party computation for federated learning in financial applications
CN112232527B (en) Safe distributed federal deep learning method
KR102595998B1 (en) Systems and methods for distributing data records using blockchain
CN110457912B (en) Data processing method and device and electronic equipment
CN110414567B (en) Data processing method and device and electronic equipment
CN110162995B (en) Method and device for evaluating data contribution degree
CN110427969B (en) Data processing method and device and electronic equipment
CN112508722B (en) Policy information verification method and device based on zero knowledge proof
US11379616B2 (en) System and method for providing anonymous validation of a query among a plurality of nodes in a network
CN110048851A (en) The method and device of multilayer linkable ring signature is generated and verified in block chain
CN115630374B (en) Testing method and device of credible numerical control system, computer equipment and storage medium
Kaur et al. A secure data classification model in cloud computing using machine learning approach
Shiau et al. What are the trend and core knowledge of information security? A citation and co-citation analysis
Wang et al. Insider collusion attack on privacy-preserving kernel-based data mining systems
Varshney et al. Big data privacy breach prevention strategies
CN110349021A (en) The method and device of secret transaction is realized in block chain
Ratnayake et al. A review of federated learning: taxonomy, privacy and future directions
Albakri et al. Risk assessment of sharing cyber threat intelligence
Nguyen et al. Poster cti4ai: Threat intelligence generation and sharing after red teaming ai models
Shaik et al. Cryptograhy and Pk-Anonymization Methods for Secure Data Storage in Cloud
CN113055159B (en) Data desensitization method and device
Michel et al. Categorization of Discoverable Cyber Attributes for Identity Protection, Privacy, and Analytics
Runtuwene et al. Information security awareness on data privacy in higher education
Gandhi et al. A Systematic Literature Review On Privacy Of Deep Learning Systems
CN113468060B (en) Program abnormity detection method and device based on recurrent thought

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant