CN111353167A - Data discrimination method, device, equipment and storage medium based on multiple providers - Google Patents

Data discrimination method, device, equipment and storage medium based on multiple providers Download PDF

Info

Publication number
CN111353167A
CN111353167A CN202010122351.5A CN202010122351A CN111353167A CN 111353167 A CN111353167 A CN 111353167A CN 202010122351 A CN202010122351 A CN 202010122351A CN 111353167 A CN111353167 A CN 111353167A
Authority
CN
China
Prior art keywords
gradient
provider
local
party
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010122351.5A
Other languages
Chinese (zh)
Inventor
谭明超
范涛
杨恺
马国强
郑会钿
吴玙
魏文斌
陈天健
杨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202010122351.5A priority Critical patent/CN111353167A/en
Publication of CN111353167A publication Critical patent/CN111353167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of financial technology (Fintech), and discloses a data discrimination method, a device, equipment and a storage medium based on multiple providers, wherein the method determines the local encryption gradient between an application party and each provider party required by model discrimination through a model discrimination request, so that the calculation of target data is completed under the condition that the characteristic data of each party is not leaked in the whole logistic regression model training process; the local target gradients are generated based on the decrypted local encryption gradients, and whether the local target gradients are smaller than the preset threshold value or not is judged, so that the limitation of the conventional logistic regression convergence judging mode is broken through, and a simple and feasible convergence judging mode is provided for longitudinal logistic regression under a multi-data provider scene; and when the local gradients of the targets are smaller than the preset threshold value, combining the local gradients of the targets to serve as a combined model difference value, so that the multiple data providers and the data application parties are combined to establish a logistic regression model.

Description

Data discrimination method, device, equipment and storage medium based on multiple providers
Technical Field
The invention relates to the technical field of financial technology (Fintech), in particular to a data discrimination method, a device, equipment and a storage medium based on multiple providers.
Background
With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry. Logistic regression, also known as logistic regression analysis, is a generalized linear regression analysis model and has been widely applied in the fields of data mining, economic prediction and the like. Which essentially belongs to the dichotomy problem. In a longitudinal federal learning scenario including a collaborator, a data application party and a data provider, the data application party and the data provider need to perform joint modeling on the premise of not revealing respective characteristic information.
However, due to the limitation of the existing calculation method, the existing longitudinal logistic regression algorithm can only realize the joint modeling task of one data application party and one data provider party, and cannot support the joint establishment of a longitudinal logistic regression model by a plurality of data provider parties and data application parties.
Disclosure of Invention
The invention mainly aims to provide a data discrimination method, a data discrimination device, data discrimination equipment and a computer readable storage medium based on multiple providers, and aims to solve the technical problem of how to support the joint establishment of a longitudinal logistic regression model by multiple data providers and data application parties.
In order to achieve the above object, the present invention provides a multi-provider-based data discrimination method applied to a collaborator, the multi-provider-based data discrimination method including:
receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required for data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
using each decrypted local encryption gradient as each target local gradient, and judging whether each target local gradient is smaller than a preset first threshold value;
and if the target local gradients are smaller than the preset first threshold, merging the target local gradients to serve as target combined gradients, and judging the data to be detected according to a combined model of the target combined gradient training.
Optionally, before the step of determining whether each of the target local gradients is smaller than a preset first threshold, the method further includes:
combining the local encryption gradients, acquiring a matched preset private key, and decrypting the combined local encryption gradients to generate multi-party gradient information;
optimizing the multi-party gradient information according to a preset optimization algorithm to generate optimized multi-party gradient information;
and splitting the optimized multi-party gradient information to generate each target local gradient, and sending the target local gradient to the corresponding application party and each provider so that the application party and each provider can update a local model.
Optionally, after the step of optimizing the multi-party gradient information according to a preset optimization algorithm and generating optimized multi-party gradient information, the method further includes:
and combining the target local gradients, generating a two-norm of the combined target local gradients based on a two-norm solving formula, and taking the two-norm as a model weight difference value.
Optionally, after the step of determining whether each of the target local gradients is smaller than a preset first threshold, the method further includes:
if the target local gradients have parts which are larger than or equal to the preset first threshold, updating the target local gradients based on the iteration times of the current round;
and repeating the execution steps to judge whether each target local gradient is smaller than a preset first threshold value or not, and terminating the current iteration process of each target local gradient when a preset iteration ending condition is met.
Optionally, the step of terminating the current iteration process on each target local gradient until a preset iteration end condition is detected to be met includes:
and terminating the current iteration process of each target local gradient until each target local gradient is detected to be smaller than the preset first threshold, or the model weight difference generated based on each target local gradient is smaller than a preset second threshold, or the iteration time of the current round reaches the preset maximum iteration time.
Optionally, before the step of receiving a to-be-detected data discrimination request initiated by an application party, determining, based on the to-be-detected data discrimination request, a local encryption gradient that is required for data discrimination and is sent by the application party and each provider, the method further includes:
the application party and the providers are combined to initialize respective local models and calculate respective gradient intermediate parameters;
and coordinating the application party and each provider to encrypt the gradient intermediate parameter according to a preset public key, so that the application party and each provider perform data interaction based on the encrypted gradient intermediate parameter and then calculate the local encryption gradient.
Optionally, before the step of associating the application party with each of the providers to initialize a respective local model and calculate a respective gradient intermediate parameter, the method further includes:
and sending the public key to the application party and each provider so that the application party and each provider can perform encryption interaction of related data in a logistic regression calculation process.
In order to achieve the above object, the present invention provides a multi-provider-based data discrimination apparatus including:
the encryption gradient acquisition module is used for receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required by data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
the local gradient judging module is used for taking each decrypted local encryption gradient as each target local gradient and judging whether each target local gradient is smaller than a preset first threshold value;
and the target difference value determining module is used for merging the target local gradients to serve as a target combined gradient if the target local gradients are smaller than the preset first threshold so as to judge the data to be detected according to a combined model of the target combined gradient training.
Further, to achieve the above object, the present invention provides a multi-provider-based data discrimination apparatus including: the data discrimination method comprises the steps of storing a plurality of provider-based data discrimination programs, and executing the plurality of provider-based data discrimination programs on the storage and the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a multi-provider based data discrimination program which, when executed by a processor, realizes the steps of the multi-provider based data discrimination method as described above.
The invention provides a data discrimination method, a data discrimination device, data discrimination equipment and a computer-readable storage medium based on multiple providers. The data discrimination method based on multiple providers comprises the steps that a data discrimination request to be detected, which is initiated by an application party, is received, the data discrimination needs to be determined based on the data discrimination request to be detected, and local encryption gradients are sent by the application party and the providers; using each decrypted local encryption gradient as each target local gradient, and judging whether each target local gradient is smaller than a preset first threshold value; and if the target local gradients are smaller than the preset first threshold, merging the target local gradients to serve as target combined gradients, and judging the data to be detected according to a combined model of the target combined gradient training. According to the method, the encryption gradient of each party is obtained, so that the calculation of the target data is completed under the condition that the characteristic data of each party is not leaked in the whole logistic regression model training process; the method has the advantages that the convergence is carried out according to the local gradient of each target, the limitation of the existing logistic regression convergence mode is broken through, a simple and feasible convergence mode is provided for the longitudinal logistic regression under the scene of multiple data providers, the joint establishment of a logistic regression model by the multiple data providers and the data application party is further realized, and the technical problem of how to support the joint establishment of the longitudinal logistic regression model by the multiple data providers and the data application party is solved.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a data discrimination method based on multiple providers according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The data discrimination device based on multiple providers in the embodiment of the invention can be a PC or a server device, and a Java virtual machine runs on the data discrimination device.
As shown in fig. 1, the multi-provider based data discrimination apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a programmer's interface 1003, a memory 1005, a communications bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. Programmer interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and optional programmer interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a programmer's interface module, and a multi-provider based data discrimination program.
In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the programmer interface 1003 is mainly used for connecting a client (programmer side) and performing data communication with the client; and the processor 1001 may be configured to call the multi-provider based data discrimination program stored in the memory 1005 and perform the following operations in the multi-provider based data discrimination method:
receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required for data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
using each decrypted local encryption gradient as each target local gradient, and judging whether each target local gradient is smaller than a preset first threshold value;
and if the target local gradients are smaller than the preset first threshold, merging the target local gradients to serve as target combined gradients, and judging the data to be detected according to a combined model of the target combined gradient training.
Further, the processor 1001 may call the multi-provider based data discrimination program stored in the memory 1005, and also perform the following operations:
combining the local encryption gradients, acquiring a matched preset private key, and decrypting the combined local encryption gradients to generate multi-party gradient information;
optimizing the multi-party gradient information according to a preset optimization algorithm to generate optimized multi-party gradient information;
and splitting the optimized multi-party gradient information to generate each target local gradient, and sending the target local gradient to the corresponding application party and each provider so that the application party and each provider can update a local model.
Further, the processor 1001 may call the multi-provider based data discrimination program stored in the memory 1005, and also perform the following operations:
and combining the target local gradients, generating a two-norm of the combined target local gradients based on a two-norm solving formula, and taking the two-norm as a model weight difference value.
Further, the processor 1001 may call the multi-provider based data discrimination program stored in the memory 1005, and also perform the following operations:
and terminating the current iteration process of each target local gradient until each target local gradient is detected to be smaller than the preset first threshold, or the model weight difference generated based on each target local gradient is smaller than a preset second threshold, or the iteration time of the current round reaches the preset maximum iteration time.
Further, the processor 1001 may call the multi-provider based data discrimination program stored in the memory 1005, and also perform the following operations:
if the target local gradients have parts which are larger than or equal to the preset first threshold, updating the target local gradients based on the iteration times of the current round;
and repeating the execution steps to judge whether each target local gradient is smaller than a preset first threshold value or not, and terminating the current iteration process of each target local gradient when a preset iteration ending condition is met.
Further, the processor 1001 may call the multi-provider based data discrimination program stored in the memory 1005, and also perform the following operations:
the application party and the providers are combined to initialize respective local models and calculate respective gradient intermediate parameters;
and coordinating the application party and each provider to encrypt the gradient intermediate parameter according to a preset public key, so that the application party and each provider perform data interaction based on the encrypted gradient intermediate parameter and then calculate the local encryption gradient.
Further, the processor 1001 may call the multi-provider based data discrimination program stored in the memory 1005, and also perform the following operations:
and sending the public key to the application party and each provider so that the application party and each provider can perform encryption interaction of related data in a logistic regression calculation process.
Based on the hardware structure, the embodiment of the data discrimination method based on multiple providers is provided.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data discrimination method based on multiple providers according to a first embodiment of the present invention. The data discrimination method based on multiple providers is applied to a longitudinal federal learning system which simultaneously has multiple data providers, and is particularly applied to a coordinator of the longitudinal federal learning system, wherein in the longitudinal federal learning system, the coordinator is in communication connection with a data application party and each data provider, and the data application party is in communication connection with each data provider;
the essence of longitudinal federal learning is that the characteristics of cross users in different business states are combined, such as business super A and bank B, in the traditional machine learning modeling process, two parts of data need to be concentrated into one data center, and then the characteristics of each user are combined into one piece of data to be used for training a model, so that modeling needs to be carried out based on the combined result, and a classification label exists on one party.
Logistic regression, also known as logistic regression analysis, is a generalized linear regression analysis model and has been widely applied in the fields of data mining, economic prediction and the like. Which essentially belongs to the dichotomy problem. In a longitudinal federal learning scenario including a collaborator, a data application party and a data provider, the data application party and the data provider need to perform joint modeling on the premise of not revealing respective characteristic information.
However, due to the limitation of the existing calculation method, the existing longitudinal logistic regression algorithm can only realize the joint modeling task of one data application party and one data provider party, and cannot support the joint establishment of a longitudinal logistic regression model by a plurality of data provider parties and data application parties.
In order to solve the problems, the invention provides a data discrimination method based on multiple providers, namely the method can complete the calculation of target data under the condition that the characteristic data of each party is not leaked in the whole logistic regression model training process by acquiring the encryption gradient of each party; the method has the advantages that the convergence is carried out according to the local gradient of each target, the limitation of the existing logistic regression convergence mode is broken through, a simple and feasible convergence mode is provided for the longitudinal logistic regression under the scene of multiple data providers, the joint establishment of a logistic regression model by the multiple data providers and the data application party is further realized, and the technical problem of how to support the joint establishment of the longitudinal logistic regression model by the multiple data providers and the data application party is solved. The data discrimination method based on multiple providers is applied to a collaborator in a longitudinal federated learning model.
Step S10, receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required by data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
the local encryption gradient is generated and transmitted by a data application party, namely the application party, and a plurality of data providers, namely the providers. The encryption gradient is generated by the data application party and a plurality of data providing parties through logarithm solving based on a probability formula of logistic regression and encryption after derivation.
In this embodiment, in a vertical federal learning system formed by a collaborator, a data application party having a classification tag, and a plurality of data providers having feature data, when it is detected that a model discrimination request for obtaining a discrimination result is issued by the data application party, the collaborator receives the model discrimination request, and determines, based on the model discrimination request, multi-feature data required for processing a current model discrimination task among feature data owned by each data provider. And the cooperative party controls the joint data application party and each data provider to carry out local logistic regression model training calculation in respective local logistic regression models based on the multi-party characteristic data required by the determined model discrimination task, so that the local model gradient of each local logistic regression model is calculated. And when the data application side initiates a model discrimination request to the cooperation side, encrypting the local model gradients respectively and sending the local model gradients to the cooperation side as the local encryption gradients.
Step S20, using each decrypted local encryption gradient as each target local gradient, and judging whether each target local gradient is smaller than a preset first threshold value;
the preset first threshold may be a default value, or may be set by the user, which is not limited in this embodiment.
Before step S20, the collaborator obtains the local encryption gradients sent by the data application party and the data providers, decrypts the local encryption gradients of multiple parties, and may further perform optimization processing on the decrypted local encryption gradient, that is, the target local gradient, so as to reduce the number of subsequent iterations. In this embodiment, in the process of each iteration, the cooperator determines one by one whether each target local gradient is smaller than a preset first threshold, so as to determine whether convergence is achieved in the iteration. In step S20, in each iteration of the longitudinal logistic regression, the collaborator may split the decrypted, merged and optimized local encryption gradient and send the split local encryption gradient to the corresponding data application party and each data provider, where the data application party and each data provider obtain the data sent by the collaborator in each iteration, and update the local logistic regression model with the data.
Step S30, if each of the target local gradients is smaller than the preset first threshold, merging each of the target local gradients to serve as a target joint gradient, so as to discriminate data to be detected according to a joint model of the target joint gradient training.
In this embodiment, if the target local gradients of each party in the current round are all smaller than the preset first threshold, the collaborator may determine that the target local gradients are converged, and then stop the iterative process, and end the model training. The cooperative party can combine the target local gradients of the current parties to be used as a target combined gradient, namely the current model gradient is the final combined model gradient. And the cooperative party sends the finally obtained joint model gradient corresponding split to the data application party and each data provider. And the data application party and each data provider update the local model according to the final gradient information, and the local model and each data provider jointly participate in calculation to obtain a final judgment probability result, namely the data judgment task of the joint model at this time is completed. If the target local gradient of each party in the iteration of the current round is larger than or equal to the preset first threshold, the cooperative party judges that the target local gradient of each party does not reach convergence, and the iteration is required to be continued. And the cooperative party subtracts the product of the weight of each party and the current iteration round number from the product of the current gradient of the corresponding party and the current iteration round number, and takes the obtained result as the weight of each party in the next iteration process. The parties calculate the local gradient of the next round based on the updated weights in combination with the respective feature data of the parties.
The invention provides a data discrimination method based on multiple providers. The data discrimination method based on multiple providers comprises the steps that a data discrimination request to be detected, which is initiated by an application party, is received, the data discrimination needs to be determined based on the data discrimination request to be detected, and local encryption gradients are sent by the application party and the providers; using each decrypted local encryption gradient as each target local gradient, and judging whether each target local gradient is smaller than a preset first threshold value; and if the target local gradients are smaller than the preset first threshold, merging the target local gradients to serve as target combined gradients, and judging the data to be detected according to a combined model of the target combined gradient training. According to the method, the encryption gradient of each party is obtained, so that the calculation of the target data is completed under the condition that the characteristic data of each party is not leaked in the whole logistic regression model training process; the method has the advantages that the convergence is carried out according to the local gradient of each target, the limitation of the existing logistic regression convergence mode is broken through, a simple and feasible convergence mode is provided for the longitudinal logistic regression under the scene of multiple data providers, the joint establishment of a logistic regression model by the multiple data providers and the data application party is further realized, and the technical problem of how to support the joint establishment of the longitudinal logistic regression model by the multiple data providers and the data application party is solved.
Further, not shown, a second embodiment of the data discrimination method based on multiple providers of the present invention is proposed based on the first embodiment shown in fig. 2 described above. In this embodiment, before step S30, the method further includes:
step a, combining the local encryption gradients, acquiring a matched preset private key, and decrypting the combined local encryption gradients to generate multi-party gradient information;
in this embodiment, after receiving each local encryption gradient calculated by each round of logistic regression model training, the cooperator merges the local encryption gradients, invokes a unique private key corresponding to a public key which is sent to the data application party and each data provider in advance to decrypt the local encryption gradients, and takes the decrypted result as the multi-party gradient information.
B, optimizing the multi-square gradient information according to a preset optimization algorithm to generate optimized multi-square gradient information;
wherein, the preset optimization algorithm can be "sgd", "rmsprop", "adam", etc.
In this embodiment, the collaborator performs optimization processing on the multi-party gradient information obtained by combining the local gradients of the data application party and the data providers according to a preset optimization algorithm, so as to reduce the number of iterations required subsequently and accelerate the convergence process.
And c, splitting the optimized multi-party gradient information to generate each target local gradient, and sending the target local gradient to the corresponding application party and each provider so that the application party and each provider can update local models.
In this embodiment, the collaborator splits the optimized multiparty gradient information, generates target local gradient information corresponding to the data application party and each data provider, and correspondingly sends the target local gradient information to the data application party and each data provider. And the data application party and each data provider receive the target local gradient sent by the collaborator in the iterative computation of the current round, and update respective local models based on the respective target local gradients.
Further, after the step b, the method further comprises the following steps:
and d, combining the target local gradients, generating a two-norm of the combined target local gradients based on a two-norm solution formula, and taking the two-norm as a model weight difference value.
In this embodiment, the two norms are square root values of the maximum feature root of the product of the transposed conjugate matrix of the optimized multi-party gradient information and the optimized multi-party gradient information. And the cooperative party generates a two-norm corresponding to the optimized multi-party gradient based on the solution formula of the two-norm, and uses the two-norm as the model weight difference value of the current iteration round.
Further, after step S20, the method further includes:
step e, if the local gradient of each target has a part which is larger than or equal to the preset first threshold, updating the local gradient of each target based on the iteration number of the current round;
in this embodiment, if the cooperator determines that the local gradient of each target obtained by iterative computation of the current round is not less than a preset first threshold, the current iteration number is obtained, the product of the weight of each party of the current round and the current iteration number is subtracted from the product of the gradient of each party of the current round and the current iteration number, and the generated difference is used as the weight of each party of the next iteration process. Each party, namely the data application party and each data provider, generates a local gradient required by the next round of iterative computation according to the latest weight and the characteristic data of each party, encrypts the local gradient and sends the local gradient to the collaborator. And the cooperative party generates a new round of target local gradient based on the latest local encryption gradient and carries out convergence again.
And f, repeatedly executing the step to judge whether each target local gradient is smaller than a preset first threshold value or not, and terminating the current iteration process of each target local gradient when a preset iteration ending condition is met.
In this embodiment, the collaborator, the data application party, and each data provider repeatedly perform the update and convergence operation of the relevant parameters until the collaborator determines that each target local gradient of the current iteration is smaller than the preset threshold in a certain iteration, that is, it can be determined that each target local gradient is converged, and then the current iteration process can be stopped, thereby completing the current training of the model.
Further, not shown, step f comprises:
and g, terminating the current iteration process of each target local gradient until each target local gradient is detected to be smaller than the preset first threshold, or a model weight difference value generated based on each target local gradient is smaller than a preset second threshold, or the iteration frequency of the current round reaches a preset maximum iteration frequency.
The preset maximum iteration number and the preset second threshold may be flexibly set according to actual conditions, and this embodiment does not specifically limit the number.
In this embodiment, three selection modes are provided for the cooperator to judge the end of the iterative process. The first is to judge whether to converge according to whether the current local gradients of the targets are all smaller than a preset first threshold value based on the above process; the second method is to judge whether convergence is carried out according to whether the model weight difference value generated after the merging of the local gradients of the targets is smaller than a preset second threshold value; the third is to judge according to the current iteration number. If the current iteration number reaches the preset maximum iteration number, even if the weight difference value of the model at the moment is still not smaller than the preset threshold value, the cooperative party stops the current iteration process and terminates the training process of the model.
The invention provides a data discrimination method based on multiple providers. The data discrimination method based on multiple providers further encrypts gradient information of each party through a preset private key, so that necessary data interaction is completed on the premise that unique feature data of each party are not leaked, and a joint training process of a model is further realized; the multi-party gradient information is optimized through a preset optimization algorithm, so that the convergence process of the model is accelerated, the iteration times required for reaching convergence are reduced, and the model training efficiency is improved; whether convergence is achieved or not is judged by setting a preset threshold value, and the iteration process is ended when a preset iteration ending condition is met, so that the convergence judging mode in the invention is more perfect; by setting three iteration ending conditions, the situation that system resources are wasted due to excessive iteration times is avoided; by converting the gradient information in the vector form into the corresponding two-norm form, the converted gradient information can be directly compared with a preset threshold value, and feasibility is provided for a convergence mode in the invention.
Further, not shown, a third embodiment of the data discrimination method based on multiple providers of the present invention is proposed based on the first embodiment shown in fig. 2 described above. In this embodiment, before step S10, the method further includes:
step h, the application party and the providers are combined to initialize respective local models and calculate respective gradient intermediate parameters;
in this embodiment, in each round of iterative computation of logistic regression by a cooperator for model training, the cooperator jointly controls a demander and each data provider in the current longitudinal federated learning system, performs model initialization operations on respective local logistic regression models independently, and immediately starts to invoke multi-feature data, which is determined by the cooperator and is required for processing a model discrimination request issued by a data application party, after completing initialization of the local models, trains and computes gradient intermediate parameters of the respective logistic regression models.
And i, coordinating the application party and each provider to encrypt the gradient intermediate parameters according to a preset public key, so that the application party and each provider perform data interaction based on the encrypted gradient intermediate parameters and then calculate the local encryption gradient.
In this embodiment, when the data application party and each data provider need to calculate the local calculation result based on training and use the local intermediate prediction values of the respective logistic regression models obtained by the respective training and calculation interactively, the cooperator controls the data application party and each data provider to encrypt the respective gradient intermediate parameters by using the common secret keys pre-distributed to the data application party and each data provider in a homomorphic encryption manner, and then interactively transmits the encrypted gradient intermediate parameters, so that the leakage of respective characteristic data between the data application party and each data provider is avoided, and the confidentiality of information data between the data application party and each data provider is ensured.
Further, before step h, the method further comprises:
and j, sending the public key to the application party and each provider so that the application party and each provider can carry out encryption interaction on related data in the logistic regression calculation process.
In this embodiment, in the current longitudinal federal learning system, after a collaborator receives a model discrimination request sent by a data application party and requiring model discrimination to obtain a discrimination probability, and determines multi-party feature data required for processing the current model discrimination request in feature data owned by each data provider based on the model discrimination request, the collaborator sends the same common secret key to the data application party of the current model discrimination task and the respective data providers of the determined multi-party feature data, so that the data application party and each data provider perform homomorphic encryption on intermediate results required to be interactively used in calculation during local model training calculation, thereby avoiding leakage of respective feature data between the data application party and each data provider.
The invention provides a data discrimination method based on multiple providers. The data discrimination method based on the multiple providers further combines the data application party and the data providers to calculate intermediate parameters of an iterative logistic regression iterative process, so that all the parties can participate in the process of model training, and the final calculation result is further obtained by combining the data owned by all the parties, thereby improving the accuracy of the final calculation result; the local feature data of each party is not leaked, the iterative calculation process of logistic regression can be smoothly completed, and the data security of each party is improved.
The invention also provides a data discrimination device based on multiple providers, which comprises:
the encryption gradient acquisition module is used for receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required by data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
the local gradient judging module is used for taking each decrypted local encryption gradient as each target local gradient and judging whether each target local gradient is smaller than a preset first threshold value;
and the target difference value determining module is used for merging the target local gradients to serve as a target combined gradient if the target local gradients are smaller than the preset first threshold so as to judge the data to be detected according to a combined model of the target combined gradient training.
The method executed by each program module can refer to each embodiment of the data discrimination method based on multiple providers of the present invention, and is not described herein again.
The invention also provides data discrimination equipment based on the multiple providers.
The multi-provider-based data discrimination apparatus includes a processor, a memory, and a multi-provider-based data discrimination program stored on the memory and executable on the processor, wherein the multi-provider-based data discrimination program, when executed by the processor, implements the steps of the multi-provider-based data discrimination method as described above.
The method implemented when the multi-provider-based data discrimination program is executed may refer to each embodiment of the multi-provider-based data discrimination method of the present invention, and details thereof are not repeated here.
The invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention stores thereon a multi-provider-based data discrimination program that, when executed by a processor, implements the steps of the multi-provider-based data discrimination method as described above.
The method implemented when the multi-provider-based data discrimination program is executed may refer to each embodiment of the multi-provider-based data discrimination method of the present invention, and details thereof are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A multi-provider-based data discrimination method is applied to a collaborator, and the multi-provider-based data discrimination method comprises the following steps:
receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required for data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
using each decrypted local encryption gradient as each target local gradient, and judging whether each target local gradient is smaller than a preset first threshold value;
and if the target local gradients are smaller than the preset first threshold, merging the target local gradients to serve as target combined gradients, and judging the data to be detected according to a combined model of the target combined gradient training.
2. The multi-provider based data discrimination method of claim 1, wherein before the step of determining whether each of the target local gradients is smaller than a preset first threshold, the method further comprises:
combining the local encryption gradients, acquiring a matched preset private key, and decrypting the combined local encryption gradients to generate multi-party gradient information;
optimizing the multi-party gradient information according to a preset optimization algorithm to generate optimized multi-party gradient information;
and splitting the optimized multi-party gradient information to generate each target local gradient, and sending the target local gradient to the corresponding application party and each provider so that the application party and each provider can update a local model.
3. The multi-provider-based data discrimination method according to claim 2, wherein after the step of optimizing the multi-party gradient information according to a preset optimization algorithm and generating optimized multi-party gradient information, the method further comprises:
and combining the target local gradients, generating a two-norm of the combined target local gradients based on a two-norm solving formula, and taking the two-norm as a model weight difference value.
4. The multi-provider based data discrimination method of claim 1, wherein after the step of determining whether each of the target local gradients is smaller than a preset first threshold, the method further comprises:
if the target local gradients have parts which are larger than or equal to the preset first threshold, updating the target local gradients based on the iteration times of the current round;
and repeating the execution steps to judge whether each target local gradient is smaller than a preset first threshold value or not, and terminating the current iteration process of each target local gradient when a preset iteration ending condition is met.
5. The multi-provider-based data discrimination method as claimed in claim 4, wherein the step of terminating the current iteration process for each of the target local gradients until a preset iteration end condition is detected to be satisfied includes:
and terminating the current iteration process of each target local gradient until each target local gradient is detected to be smaller than the preset first threshold, or the model weight difference generated based on each target local gradient is smaller than a preset second threshold, or the iteration time of the current round reaches the preset maximum iteration time.
6. The multi-provider-based data discrimination method according to claim 1, wherein before the step of receiving a to-be-measured data discrimination request from an application party, determining a local encryption gradient required for data discrimination based on the to-be-measured data discrimination request and sent by the application party and each provider party, the method further comprises:
the application party and the providers are combined to initialize respective local models and calculate respective gradient intermediate parameters;
and coordinating the application party and each provider to encrypt the gradient intermediate parameter according to a preset public key, so that the application party and each provider perform data interaction based on the encrypted gradient intermediate parameter and then calculate the local encryption gradient.
7. The multi-provider based data discrimination method of claim 6, wherein prior to the step of associating the application with each of the providers initializing a respective local model and calculating respective gradient intermediate parameters, further comprising:
and sending the public key to the application party and each provider so that the application party and each provider can perform encryption interaction of related data in a logistic regression calculation process.
8. A multi-provider based data discrimination apparatus, comprising:
the encryption gradient acquisition module is used for receiving a to-be-detected data discrimination request initiated by an application party, determining local encryption gradients required by data discrimination based on the to-be-detected data discrimination request and sent by the application party and each provider;
the local gradient judging module is used for taking each decrypted local encryption gradient as each target local gradient and judging whether each target local gradient is smaller than a preset first threshold value;
and the target difference value determining module is used for merging the target local gradients to serve as a target combined gradient if the target local gradients are smaller than the preset first threshold so as to judge the data to be detected according to a combined model of the target combined gradient training.
9. A multi-provider based data discrimination apparatus, characterized by comprising: a memory, a processor and a multi-provider based data discrimination program stored on the memory and executable on the processor, the multi-provider based data discrimination program when executed by the processor implementing the steps of the multi-provider based data discrimination method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a multi-provider based data discrimination program, which when executed by a processor, implements the steps of the multi-provider based data discrimination method according to any one of claims 1 to 7.
CN202010122351.5A 2020-02-26 2020-02-26 Data discrimination method, device, equipment and storage medium based on multiple providers Pending CN111353167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010122351.5A CN111353167A (en) 2020-02-26 2020-02-26 Data discrimination method, device, equipment and storage medium based on multiple providers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010122351.5A CN111353167A (en) 2020-02-26 2020-02-26 Data discrimination method, device, equipment and storage medium based on multiple providers

Publications (1)

Publication Number Publication Date
CN111353167A true CN111353167A (en) 2020-06-30

Family

ID=71195872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010122351.5A Pending CN111353167A (en) 2020-02-26 2020-02-26 Data discrimination method, device, equipment and storage medium based on multiple providers

Country Status (1)

Country Link
CN (1) CN111353167A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9864956B1 (en) * 2017-05-01 2018-01-09 SparkCognition, Inc. Generation and use of trained file classifiers for malware detection
CN109598385A (en) * 2018-12-07 2019-04-09 深圳前海微众银行股份有限公司 Anti money washing combination learning method, apparatus, equipment, system and storage medium
CN110443067A (en) * 2019-07-30 2019-11-12 卓尔智联(武汉)研究院有限公司 Federal model building device, method and readable storage medium storing program for executing based on secret protection
CN110704860A (en) * 2019-11-18 2020-01-17 深圳前海微众银行股份有限公司 Longitudinal federal learning method, device and system for improving safety and storage medium
CN110751294A (en) * 2019-10-31 2020-02-04 深圳前海微众银行股份有限公司 Model prediction method, device, equipment and medium combining multi-party characteristic data
CN110782042A (en) * 2019-10-29 2020-02-11 深圳前海微众银行股份有限公司 Method, device, equipment and medium for combining horizontal federation and vertical federation
CN110797124A (en) * 2019-10-30 2020-02-14 腾讯科技(深圳)有限公司 Model multi-terminal collaborative training method, medical risk prediction method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9864956B1 (en) * 2017-05-01 2018-01-09 SparkCognition, Inc. Generation and use of trained file classifiers for malware detection
CN109598385A (en) * 2018-12-07 2019-04-09 深圳前海微众银行股份有限公司 Anti money washing combination learning method, apparatus, equipment, system and storage medium
CN110443067A (en) * 2019-07-30 2019-11-12 卓尔智联(武汉)研究院有限公司 Federal model building device, method and readable storage medium storing program for executing based on secret protection
CN110782042A (en) * 2019-10-29 2020-02-11 深圳前海微众银行股份有限公司 Method, device, equipment and medium for combining horizontal federation and vertical federation
CN110797124A (en) * 2019-10-30 2020-02-14 腾讯科技(深圳)有限公司 Model multi-terminal collaborative training method, medical risk prediction method and device
CN110751294A (en) * 2019-10-31 2020-02-04 深圳前海微众银行股份有限公司 Model prediction method, device, equipment and medium combining multi-party characteristic data
CN110704860A (en) * 2019-11-18 2020-01-17 深圳前海微众银行股份有限公司 Longitudinal federal learning method, device and system for improving safety and storage medium

Similar Documents

Publication Publication Date Title
CN109165725B (en) Neural network federal modeling method, equipment and storage medium based on transfer learning
CN110633806B (en) Longitudinal federal learning system optimization method, device, equipment and readable storage medium
CN109284313B (en) Federal modeling method, device and readable storage medium based on semi-supervised learning
CN109255444B (en) Federal modeling method and device based on transfer learning and readable storage medium
CN110633805B (en) Longitudinal federal learning system optimization method, device, equipment and readable storage medium
CN110008717B (en) Decision tree classification service system and method supporting privacy protection
Gai et al. Blend arithmetic operations on tensor-based fully homomorphic encryption over real numbers
US20210232974A1 (en) Federated-learning based method of acquiring model parameters, system and readable storage medium
WO2020015478A1 (en) Model-based prediction method and device
CN109067528B (en) Password operation method, work key creation method, password service platform and equipment
CN108123800A (en) Key management method, device, computer equipment and storage medium
CN110851786A (en) Longitudinal federated learning optimization method, device, equipment and storage medium
CN109460966A (en) Contract signing method, apparatus and terminal device based on requesting party's classification
CN111340247A (en) Longitudinal federated learning system optimization method, device and readable storage medium
CN111242316B (en) Longitudinal federal learning model training optimization method, device, equipment and medium
CN112199709A (en) Multi-party based privacy data joint training model method and device
CN110751294A (en) Model prediction method, device, equipment and medium combining multi-party characteristic data
CN107196919B (en) Data matching method and device
CN111767411A (en) Knowledge graph representation learning optimization method and device and readable storage medium
WO2023130705A1 (en) User data processing system, method and apparatus
CN112182635A (en) Method, device, equipment and medium for realizing joint modeling
CA3143855A1 (en) Systems and methods for federated learning on blockchain
CN113645294B (en) Message acquisition method and device, computer equipment and message transmission system
CN114429223A (en) Heterogeneous model establishing method and device
CN116502732B (en) Federal learning method and system based on trusted execution environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200630

RJ01 Rejection of invention patent application after publication