CN109308418A - A kind of model training method and device based on shared data - Google Patents

A kind of model training method and device based on shared data Download PDF

Info

Publication number
CN109308418A
CN109308418A CN201710632357.5A CN201710632357A CN109308418A CN 109308418 A CN109308418 A CN 109308418A CN 201710632357 A CN201710632357 A CN 201710632357A CN 109308418 A CN109308418 A CN 109308418A
Authority
CN
China
Prior art keywords
data
performing environment
value
credible performing
output valve
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710632357.5A
Other languages
Chinese (zh)
Other versions
CN109308418B (en
Inventor
王力
周俊
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710632357.5A priority Critical patent/CN109308418B/en
Publication of CN109308418A publication Critical patent/CN109308418A/en
Application granted granted Critical
Publication of CN109308418B publication Critical patent/CN109308418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

Disclose a kind of model training method and device based on shared data.On the one hand, the data that can be provided according to multiple data providings carry out joint training, to obtain more accurate comprehensive data model;On the other hand, the operation of private data involved in model training process (such as data deciphering operation, partial model parameter updating operation etc.) is all encapsulated in the credible performing environment of data providing and is executed.Data clear text can not be obtained except credible performing environment.

Description

A kind of model training method and device based on shared data
Technical field
The embodiment of this specification is related to data mining technology field more particularly to a kind of model instruction based on shared data Practice method and device.
Background technique
In big data era, by excavating to mass data, various forms of useful informations, therefore number can be obtained According to importance it is self-evident.Different mechanisms is owned by respective data, but the data mining effect of any mechanism, It will be limited to the data bulk and type that its own possesses.For this problem, a kind of direct resolving ideas is: more mechanisms It cooperates with each other, data is shared, to realize better data mining effect, realize win-win.
However for data owning side, data itself are a kind of up to much assets of tool, and for protection Privacy prevents demands, the data owning sides such as leakage to be often reluctant to directly provide data, and this situation leads to " number According to shared " it is difficult actual operation in reality.Therefore, how data sharing is realized under the premise of fully ensuring that data safety, Have become the problem of being concerned in industry.
Summary of the invention
In view of the above technical problems, the embodiment of this specification provide a kind of model training method based on shared data and Device, technical solution are as follows:
According to the 1st of this specification embodiment the aspect, a kind of model training method based on shared data, this method are provided Include:
It is iterated training using following steps, until meeting model training requirement:
The ciphertext data that at least one data providing provides are obtained respectively;
The ciphertext data of each data providing are respectively corresponded to the credible performing environment of input data provider;
The output valve of each credible performing environment is obtained, the output valve is calculated according to the ciphertext data;
According to given training objective model, the deviation of computation model predicted value and true value;The model predication value It is determined according to the output valve of each credible performing environment, the true value is the overall situation according to determined by the data of each data providing Label value;
The deviation is back to each credible performing environment respectively, so that each credible performing environment updates localized mode respectively Shape parameter;
Wherein, following steps are executed inside any credible performing environment:
The ciphertext data of input are decrypted, clear data characteristic value is obtained;
According to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated;
According to the deviation of return, partial model parameter is updated.
According to the 2nd of this specification embodiment the aspect, a kind of model training method based on shared data, this method are provided Include:
The ciphertext data of at least 1 data providing are respectively corresponded to the credible performing environment of input data provider;? In each credible performing environment, the ciphertext data of input are decrypted respectively, obtain each clear data characteristic value;
It is iterated training using following steps, until meeting model training requirement:
In each credible performing environment, according to current partial model parameter, it is corresponding defeated to calculate clear data characteristic value It is worth out;
According to given training objective model, the deviation of computation model predicted value and true value;The model predication value It is determined according to the output valve of each credible performing environment, the true value is the overall situation according to determined by the data of each data providing Label value;
The deviation is back to each credible performing environment respectively;
In each credible performing environment, partial model parameter is updated according to the deviation of return.
According to the 3rd of this specification embodiment the aspect, a kind of model training method based on shared data, this method are provided Include:
It is iterated training using following steps, until meeting model training requirement:
The data that multiple data providings provide are obtained respectively, wherein the data mode that at least one data providing provides For ciphertext data, the data mode that other data providings provide is clear data;
If the data mode that data providing provides is ciphertext data, ciphertext data are corresponded into the input data provider Credible performing environment;
The output valve of each credible performing environment is obtained, the output valve is calculated according to the ciphertext data;
The clear data provided using the output valve of each credible performing environment and other data providings, computation model The deviation of predicted value and true value;The model predication value is according to the output valve determination of each credible performing environment and plaintext number According to characteristic value determine;The true value is the overall situation label value according to determined by the data of each data providing;
The deviation is back to each credible performing environment respectively, so that each credible performing environment updates localized mode respectively Shape parameter;
Wherein, following steps are executed inside any credible performing environment:
The ciphertext data of input are decrypted, clear data characteristic value is obtained;
According to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated;
According to the deviation of return, partial model parameter is updated.
According to the 4th of this specification embodiment the aspect, a kind of data predication method based on shared data modeling is provided, it should Method includes:
The ciphertext data that at least one data providing provides are obtained respectively;
The ciphertext data of each data providing are respectively corresponded to the credible performing environment of input data provider;
The output valve of each credible performing environment is obtained, the output valve is calculated according to the ciphertext data;
By the output valve input of each credible performing environment prediction model trained in advance, predicted value is calculated;
Wherein, following steps are executed inside any credible performing environment:
The ciphertext data of input are decrypted, clear data characteristic value is obtained;
According to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated.
According to the 5th of this specification embodiment the aspect, a kind of model training apparatus based on shared data, the device are provided Including below for realizing the module of repetitive exercise:
Data obtaining module, for obtaining the ciphertext data of at least one data providing offer respectively;
Data input module, for the ciphertext data of each data providing to be respectively corresponded the credible of input data provider Performing environment;
Output valve obtains module, and for obtaining the output valve of each credible performing environment, the output valve is according to described close Literary data are calculated;
Deviation computing module, for according to given training objective model, computation model predicted value and true value it is inclined Difference;The model predication value determines that the true value is provides according to each data according to the output valve of each credible performing environment Overall situation label value determined by the data of side;
Deviation return module credible is held for the deviation to be back to each credible performing environment respectively so that each Row environment updates partial model parameter respectively;
Wherein, include: inside any credible performing environment
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for it is corresponding to calculate clear data characteristic value according to current partial model parameter Output valve;
Parameter updates submodule, for the deviation according to return, updates partial model parameter.
According to the 6th of this specification embodiment the aspect, a kind of model training apparatus based on shared data, the device are provided Including below for realizing the module of repetitive exercise:
Data obtaining module, the data provided for obtaining multiple data providings respectively, wherein at least one data mention The data mode that supplier provides is ciphertext data, and the data mode that other data providings provide is clear data;
Data input module, if the data mode for data providing to provide is ciphertext data, by ciphertext data pair Answer the credible performing environment of the input data provider;
Output valve obtains module, and for obtaining the output valve of each credible performing environment, the output valve is according to described close Literary data are calculated;
Deviation computing module, for the output valve and the offer of other data providings using each credible performing environment Clear data, the deviation of computation model predicted value and true value;The model predication value is according to each credible performing environment The characteristic value of output valve determination and clear data determines;The true value is according to determined by the data of each data providing Global label value;
Deviation return module credible is held for the deviation to be back to each credible performing environment respectively so that each Row environment updates partial model parameter respectively;
Wherein, include: inside any credible performing environment
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for it is corresponding to calculate clear data characteristic value according to current partial model parameter Output valve;
Parameter updates submodule, for the deviation according to return, updates partial model parameter.
According to the 7th of this specification embodiment the aspect, a kind of data prediction meanss based on shared data modeling are provided, it should Device includes:
Data obtaining module, for obtaining the ciphertext data of at least one data providing offer respectively;
Data input module, for the ciphertext data of each data providing to be respectively corresponded the credible of input data provider Performing environment;
Output valve obtains module, and for obtaining the output valve of each credible performing environment, the output valve is according to described close Literary data are calculated;
Predictor calculation module, the prediction model trained in advance for the output valve input by each credible performing environment, meter Calculation obtains predicted value;
Wherein, any credible performing environment EuInside includes:
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for it is corresponding to calculate clear data characteristic value according to current partial model parameter Output valve.
Technical solution provided by this specification embodiment: on the one hand, the number that can be provided according to multiple data providings According to joint training is carried out, to obtain more accurate comprehensive data model;It on the other hand, will be hidden involved in model training process The operation (such as data deciphering operation, partial model parameter updating operation etc.) of private data is all encapsulated in the credible of data providing It is executed in performing environment.That is: data clear text can not be obtained except credible performing environment, to be effectively guaranteed altogether Enjoy the Information Security of data providing.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The embodiment of this specification can be limited.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification can also be obtained according to these attached drawings for those of ordinary skill in the art Other attached drawings.
Fig. 1 is data sharing modality for co-operation schematic diagram;
Fig. 2 is the schematic diagram of the framework of the model training systems of this disclosure;
Fig. 3 is the schematic diagram of the framework of the data forecasting system of this disclosure;
Fig. 4 a is the schematic diagram of the framework of the model training systems of this specification one embodiment;
Fig. 4 b is the schematic diagram of the framework of the model training systems of this specification another embodiment;
Fig. 5 is the structural schematic diagram of the model training apparatus based on shared data of this disclosure;
Fig. 6 is the structural schematic diagram of the data prediction meanss based on shared data modeling of this disclosure;
Fig. 7 is a kind of structural schematic diagram of computer equipment of this disclosure.
Specific embodiment
In order to make those skilled in the art more fully understand technical solution, below in conjunction with attached in this specification embodiment Figure, is described in detail the technical solution in the embodiment of this specification, it is clear that described embodiment is only this theory A part of the embodiment of bright book, instead of all the embodiments.The embodiment of base in this manual, ordinary skill people The range of the embodiment protection of this specification all should belong in member's every other embodiment obtained.
As shown in Figure 1, be related to several roles: data providing, data are dug in " data sharing " this modality for co-operation Pick side, Data attack side.Multiple data providings give data to data mining side jointly and carry out data sharing excavation, but are Protection data-privacy, it is undesirable that data are intactly supplied to data mining side.On the other hand, data providing is also It needs to prevent Data attack side from stealing data.In a broad sense, for arbitrary data provider, data mining side and other Data providing actually all constitutes potential attacker.
Therefore, realize that secure data shares a kind of safe primary demand and is: data providing adds data by certain Data mining side is supplied to after close processing.The data of encryption still retain certain information content, so that data mining side is still The data that can use encryption carry out data mining, but are unable to get specific data content.
For the demand, this specification embodiment provides a kind of data-sharing scheme based on credible performing environment.It should Scheme is used for according to mass data sample training data model, and wherein data sample derives from multiple data providings, due to not Same data providing can provide data sample feature, therefore sharing each data providing from different dimensions respectively Data Integration after, the richer data sample of characteristic dimension can be formed, to train the data model of better effect.
It is situated between first to credible performing environment (TEE, Trusted Execution Environment) the relevant technologies Continue: credible performing environment is a safety zone on device handler, can guarantee the code for being loaded into the environmental interior Safety, confidentiality and integrality with data.Credible performing environment provides the performing environment of an isolation, the safety provided Feature includes: isolated execution, the integrality of trusted application, the confidentiality of trust data, secure storage etc..On the whole, credible Performing environment can provide the safety than operating system higher level.Credible performing environment is for mobile device (example earliest Such as smart phone, tablet computer, set-top box, smart television) it proposes, current application scenarios have been not limited only to mobile neck Domain.Common credible performing environment implementation includes PSP (Platform Security Processor), the ARM of AMD SGX (Software Guard Extensions) of TrustZone, Intel etc..
In the application scenarios of data sharing, for arbitrary data provider, it is believed that other than oneself, He either one is fly-by-night, therefore can be respectively that each data providing creates credible performing environment, will be related to counting It is all encapsulated in inside each credible performing environment and executes according to the operation of security risk, to meet data providing to data safety The requirement of property.
A kind of framework for data-sharing systems that this specification embodiment provides is shown in Figure 2.Assuming that sharing U data Provider: 1,2 ... U provide data to data mining side jointly, so that data mining side trains a world model.It is whole Data sharing working principle it is as follows:
Encrypted data are supplied to data mining side by each data providing;
Credible performing environment is respectively created to each data providing in data mining policy, then distinguishes ciphertext data It is corresponding to input each credible performing environment.
Inside credible performing environment, first ciphertext data are decrypted, then utilize preservation inside credible performing environment Model parameter calculation go out an output valve, the output valve of credible performing environment can be used for data mining side and carry out model instruction Practice, but data mining side can not obtain specific data content from the output valve of credible performing environment.
Data mining root carries out joint training according to the output valve of each credible performing environment, thus obtain one it is global Data model.
In whole process, it is related to the operation of data clear text, is all encapsulated in inside each credible performing environment, due to credible Performing environment can effectively guarantee the data safety of data providing for being entirely isolated except data providing Property.
Below with reference to specific steps flow chart, the model training method provided this specification embodiment is illustrated:
The training of data model can find optimum model parameter value by iterating, and each iteration can all update mould Shape parameter, until the model parameter of update meets training requirement, then training terminates.It is shown in Figure 2, below with primary complete Iterative process is illustrated this specification example scheme:
S101 obtains the ciphertext data that U data providing provides respectively;
Data providing provides the sample data for being used for model training.Data provided by different data providings can be with It is respectively provided with entirely different feature, it is possible to have the identical feature in identical or part.In practical application, arbitrary data is provided Which feature side and data mining side can arrange to need to upload as model training in advance, and this specification embodiment does not do this It limits.
Assuming that sharing U data providing (wherein U >=2), then the data mode respectively provided is expressed as follows:
Data providing 1:(x1 1,x2 1,x3 1...), it is denoted as (x1,x2,x3,…)1→X1
Data providing 2:(x1 2,x2 2,x3 2...), it is denoted as (x1,x2,x3,…)2→X2
……
Data providing U:(x1 U,x2 U,x3 U...), it is denoted as (x1,x2,x3,…)U→XU
Here x1,x2,x3... respectively represent the different characteristic value of a data, 1,2 at subscript ... U represents each data The mark of provider.For convenience of description, can unify for data format to be expressed as target form on the whole.It is understood that It can be the identical or different feature of meaning, and the spy that each data providing is uploaded provided by each data providing Levying quantity can be different.But from multiple data providings data obtained, multiple groups can be extracted for describing The data of same target, to form global training sample.
For example, providing the data sample with different characteristic respectively there are 3 data providings:
The feature that data providing 1 uploads includes: age, occupation;
The feature that data providing 2 uploads includes: gender, height, weight;
The feature that data providing 3 uploads includes: gender, blood pressure, heart rate;
If being directed to any one individual i, features described above data can be obtained respectively from 3 data providings, then integrated Data provided by above-mentioned 3 data providings, can form a large amount of training samples, so that joint training goes out one about (year Age, occupation, gender, height, weight, blood pressure, heart rate) totally 7 features model.
In order to guarantee the safety of data, each data providing encrypts data, by data with the shape of ciphertext Formula is supplied to data mining side.Know here it is considered that the encryption logic that each data providing uses only has oneself, therefore encrypts Data can the safely storage or transmission in mistrustful environment afterwards.
When it is implemented, data mining side can be by way of network transmission from data providing request data, data After provider encrypts data, ciphertext data are sent to data mining side by network;In another embodiment, Ciphertext data can also be stored in the storage equipment of data mining side, so that data mining side is directly from local reading.
The ciphertext data of each data providing are respectively corresponded the credible execution ring for inputting each data providing by S102 Border;
Credible performing environment E is respectively created to each data providing 1,2 ... U in data mining policy1、E2…EU, to protect For card for data provided by arbitrary data providing u (u=1,2 ... U), the operation for being related to data safety can only be right at its The E answereduMiddle progress, in EuExcept can not both perceive these operations, can not also influence these operations.
Under different implementations, the concrete mode for creating credible performing environment is different, and this specification embodiment is not right The concrete mode for creating credible performing environment is defined.In addition, data mining side creates the operation of credible performing environment, both may be used To be executed after the ciphertext data for obtaining data providing for the first time, can also first carry out in advance.
Each credible performing environment provides input interface and output interface, the function of one of output interface to outside It is: receives externally input ciphertext data.After data mining side obtains the ciphertext data of data providing, determining and each data The corresponding E of provider uu, ciphertext data are then inputted into each E respectivelyuCiphertext Data Input Interface.
S103 obtains the output valve O of each credible performing environment1、O2…OU
In each EuInside can first be decrypted the ciphertext data inputted, obtain clear data Xu, then according to pre- If algorithm and inside partial model parameter WuTo clear data XuIt is calculated, obtains corresponding output valve Ou, and pass through EuAn output interface by OuIt exports to EuIt is external.That is, data mining side can obtain multiple credible execution rings respectively The output valve O in border1、O2…OU
For the implementation of complete description system, the present embodiment first from the angle of data mining side, is instructed data model Experienced disposed of in its entirety process is introduced, due to EuInternal operation is externally sightless, therefore in the present embodiment can be with By each EuRegard black box as.About each EuInternal specific implementation, will be described in detail in the embodiment below.
S104 calculates deviation Δ=Y-h [z (O according to given training objective model1,O2,…OU)];
Deviation is a calculative significant in value in model training iterative process, it is assumed that for a data sample I, data mode can be expressed as (Xi, yi), in which:
Xi=(xi1,xi2...), xi1,xi2... it is respectively multiple characteristic values of data sample i;
yiFor the label value of data sample i;
Assuming that the form of training objective model is y=h (X), then for data sample (Xi, yi), prediction deviation value is equal to Label value yiWith model predication value h (Xi) difference, it may be assumed that
Δi=h (Xi)-yiOr Δi=yi-h(Xi)
Deviation ΔiMainly there are two aspect effects in model training:
On the one hand effect is the fitting effect with evaluation model to training sample set: for any bar data sample i, Δi It is worth smaller, illustrates that the fitting effect of model is better;If there is n group data sample, then n ΔiValue is smaller on the whole, illustrates mould The fitting effect of type is better.It is general by calculating Σ Δ in practical applicationiMode, come on the whole evaluation function to training The fitting effect of sample set.
On the other hand effect is to participate in model parameter iteration to update operation: assuming that there is a group model parameter W=(w1, w2...), then the citation form (may have various deformation in practical application) that parameter iteration updates is as follows:
W←W-αΔX
The process of entire model training is that continuous iteration updates model parameter, so that fitting of the model to training sample set Effect reaches training requirement (such as deviation is sufficiently small).Parameter more new formula is briefly described below, more about parameter The specific derivation process of new formula, reference can be made to the introduction of the prior art.
In above-mentioned more new formula, " W " on the right side of arrow indicates that the parameter value before updating every time, " W " on the left of arrow indicate Each updated parameter value, it can be seen that the variable quantity updated every time is the product of α, Δ, X three.
α indicates learning rate, also referred to as step-length, which determines that each iteration is the update amplitude of parameter, if learning rate It is too small, may cause reach training requirement this process speed it is slow, if learning rate is excessive, may cause Overshoot the minimum phenomenon, i.e., approach fitting with update process without decree model.It is appropriate on how to choose Learning rate may refer to the introduction of the prior art, and in embodiment in the present specification, α is considered as preset numerical value.
X indicates the characteristic value of data sample, according to the difference of selected more new formula, X may also representation eigenvalue not Same form further can be illustrated in this specification latter embodiments.
In this specification example scheme, data mining side can obtain the output valve of multiple credible performing environments respectively O1、O2…OU, it is assumed that y=h (z) is global training objective pattern function, and wherein z is about O1、O2…OUFunction, be denoted as z (O1,O2,…OU), that is, it is directed to the Copula of U data providing output valve, and O1、O2…OUIt is again about X respectively1、X2… XUFunction, in summary: y=h (z) be also about X1、X2…XUFunction.
Define Δ=Y-h [z (O1,O2,…OU)] or Δ=h [z (O1,O2,…OU)]-Y;
Wherein h [z (O1,O2,…OU)] it is z (O1,O2,…OU) model predication value;Y is z (O1,O2,…OU) corresponding true Real value, the i.e. overall situation label value according to determined by the data of each data providing;The difference DELTA of the two is deviation.
Y=h (z) actual form can be selected according to hands-on demand, such as linear regression model (LRM) (linear Regression model), logistical regression model (logistic regression model), etc..This specification is real Example is applied not need to be defined.
In addition, for every group of O1,O2,…OU, corresponding overall situation label value Y can be determining according to various ways, later Embodiment in will be explained in.
Δ is back to E by S105 respectively1、E2…EU, so that E1、E2…EUPartial model parameter W is updated respectively1、W2…WU。 Updated parameter will be used to calculate output valve O during next iterationu
It is the angle from data mining side above, the disposed of in its entirety process of data model training is introduced, lower mask Body introduces the processing logic inside credible performing environment:
As shown in Fig. 2, for any credible performing environment Eu, it is internal to realize 3 kinds of basic functions:
1) data deciphering:
The encryption logic that corresponding data provider u is used, in EuIn be stored with corresponding decryption logic, such as decipherment algorithm Information, key information etc..According to these information, in EuInside can be decrypted the ciphertext data of input, obtain in plain text Data feature values Xu=(x1,x2,…)u
Data deciphering operation executes after S102.
2) output valve calculates:
In EuIn be stored with partial model parameter Wu=(w1,w2,…)u, in EuInside, can be according to current partial model Parameter Wu, calculate XuCorresponding output valve Ou;In entire training process, WuIt is that continuous iteration updates, when iterating to calculate for the first time Use the parameter value of initialization.
OuSpecific calculation, be according to world model y=h (z)=h [z (O1,O2,…OU)] form determine, For example, world model is illustrated as y=h (z)=h (w for linear regression model (LRM) and logistical regression model1x1+ w2x2+ ...) form, then corresponding OuWith Copula z (O1,O2,…OU) can be following form respectively:
Ou=w1 ux1 u+w2 ux2 u+ ..., it is denoted as (w1x1+w2x2+…)u
In practical application, above-mentioned OuExpression formula in, can also include a constant term parameter bu, it may be assumed that
Ou=bu+w1 ux1 u+w2 ux2 u+…
In fact, if enabling bu=w0 u, and by w0 uUnderstanding is characterized x0 uCorresponding parameter and feature x0 uCharacteristic value It is constantly equal to 1, then OuExpression formula can indicate are as follows:
Ou=w0 ux0 u+w1 ux1 u+w2 ux2 u+…
As it can be seen that regardless of whether there are constant term parameter, the form of whole expression formula is unified, therefore Ou=w1 ux1 u+ w2 ux2 u+ ... expression formula be interpreted as covering " having constant term parameter " and " no constant term parameter " two kinds of situations simultaneously.It is practical It both may include constant term parameter in model parameter for arbitrary u in, and can not also include constant term ginseng Number.
Certainly, above-mentioned OuWith Copula z (O1,O2,…OU) form be only used for schematically illustrating, should not be construed as pair The restriction of this specification example scheme.
Output valve calculating operation executes after the operation of above-mentioned data deciphering, after output valve is calculated, continues to execute S104.
3) parameter updates:
In each EuCurrent partial model parameter W is preserved in insideu, receive EuAfter the deviation Δ that outside returns, According to shaped like Wu←Wu-αΔXuParameter more new formula to WuIt is updated (using the parameter of initialization before updating for the first time Value).Certainly, the parameter of actual use more new formula is not limited to the above form.Such as:
If read every time from data source and to EuIn have input 1 data i as training sample, then parameter more new formula Are as follows: Wu←Wu-αΔiXi u
If read every time from data source and to EuIn have input a plurality of data and decline as training sample, and using gradient Method (gradient descent) carries out parameter update, then parameter more new formula are as follows:I.e. all instructions Practice sample and both participates in update operation;
If read every time from data source and to EuIn have input a plurality of data as training sample, and use stochastic gradient Descent method (stochastic gradient descent) carries out parameter update, then parameter more new formula are as follows: Wu←Wu-αΔiXi u, wherein i is arbitrary value, that is, randomly selects a training sample and participate in updating operation;
More new algorithm is only used for schematically illustrating above, should not be construed as the restriction to scheme.For example, quasi- in order to reduce Phenomenon is closed, regularization correction item can be increased in more new formula.In addition there are other available more new algorithms, and this specification is not It enumerates one by one again.
Parameter updating operation executes after above-mentioned S105, and after parameter updates, an iteration, which updates, to be completed, after this update Obtained parameter will be used to calculate output valve O during next iterationu
Primary complete iterative process is described above, iterates through the above steps, until meeting model instruction Practice and require, model training here requires to can be for example: the deviation Δ of world model is sufficiently small, adjacent iterates to calculate twice Δ difference is sufficiently small, EuThe internal O iterated to calculate twiceuDifference is sufficiently small or reach preset the number of iterations etc., when Additional verifying collection also so can be used to be verified, this specification requires not needing to limit to specific model training It is fixed.
As it can be seen that using above scheme, by the operation of private data involved in model training process, (such as data deciphering is grasped Work, partial model parameter updating operation etc.) it is all encapsulated in the credible performing environment of data providing and executes.That is: Data clear text can not be obtained except credible performing environment, in some embodiments, except credible performing environment even nothing Method gets specific partial model parameter, to be effectively guaranteed the Information Security of shared data provider.
Above from the whole model training scheme based on shared data for describing this specification embodiment and providing, in conjunction with reality Border application demand, in terms of details whole design, there are also some alternative embodiments, are exemplified below:
In S101~S102, a data can be only read every time to credible performing environment, can also disposably be read more Data is to credible performing environment.In i.e. each iterative process, N ciphertext data are obtained from each data providing respectively, wherein N It can be preset 1 numerical value of being not less than and realize the replacement of training sample by obtaining content different data every time.
It will additionally be appreciated that the acquisition of training sample data, both can gradually obtain with iterative process, be also possible to It is disposable to obtain.For example, data bulk needed for iteration is N every time, then either obtain in each iteration N data, And N data is decrypted in credible performing environment;It is (such as complete greater than the data of N to be also possible to disposably to obtain quantity Amount data or the multiple of N etc.) credible performing environment is inputted afterwards, every time in credible performing environment on demand to the progress of N data Decryption;Can also be it is disposable obtain that data (such as full dose data or the multiple of N etc.) of the quantity greater than N input afterwards can Believe performing environment and is disposably decrypted in data of the credible performing environment to input;Etc..
As it can be seen that the step of each iteration is had to carry out in practical application includes S103-S105 and credible performing environment Internal " output valve calculating ", " parameter update " step, the and " data inside step S101, S102 and credible performing environment Decryption " step, necessarily executes in each iteration.In short, the mode that sample data obtains can be clever according to the actual situation Setting living, these have no effect on the realization of overall plan.
The relevance of data between multiple data providings, can feature reality general by certain and having mark action It is existing, such as pass through ID card No., it is ensured that from multiple data providings data obtained to be for describing same people. The identification characteristics do not necessarily participate in model training, and can improve this feature Information Security by modes such as Hash.
Each EuThe information for being all based on data providing oneself offer is created, EuIt should meet in whole design basic Design standard, but be not required in specific implementation completely the same.Such as different data deciphering algorithms, difference can be used Parameter more new algorithm etc..
For every group of O1,O2,…OU, global label value Y can be determining according to various ways, such as:
1) the label value Y that some data providing u is provideduIt is determined as Y;
2) according to the label value Y of multiple data providing u1, u2 ... offersu1、Yu2... Y, specific method of determination are determined jointly It can be and for example calculate weighted average, " logical AND ", " logic or " etc.;
3) Y is determined by other channels except data providing;
For example, establishing a prediction model to certain disease illness rate, it is known that the disease and (age, occupation, property Not, height, weight) 5 feature correlations, and:
Mechanism 1 can provide characteristic: age, occupation;
Mechanism 2 can provide characteristic: gender, height, weight;
Assuming that the prediction model is two disaggregated models, i.e., model output value includes that two kinds of " illness " and " non-illness " is (corresponding Prediction result can be presented as " high risk " and " low-risk ").Each mechanism both can on the basis of characteristic is provided, Further provide for label value, i.e., " whether illness " as a result, label value can not also be provided.And the determination of global label value, Can there are many strategy, such as:
Global label value be subject to a certain mechanism offer label value, actual conditions may be that this mechanism of family more weighs Prestige, it is also possible to which another mechanism can not provide label value.
The label value that global label value is provided according to Liang Jia mechanism determines jointly, such as: if at least a mechanism provides Label value be " illness ", then global label value is determined as " illness ".
In addition, in some cases, a collection of user " whether illness " directly may also be known from other channels in data mining side As a result, demand is further to excavate the relationship of the result Yu other features, " other features " can be obtained from data providing, And the above-mentioned result known in advance can be directly as global label value.
After training, each EuIt will can update for the last time obtained parameter to export to data mining side, so as to data Safeguard complete data model in excavation side.Parameter distribution formula can also be stored in each EuInside, further to promote safety Property.
If using by each EuParameter is still stored in EuInternal scheme, then in model service stage, each data are mentioned Supplier respectively uploads ciphertext data to EuIn, by EuCiphertext data are decrypted and calculated with output valve, finally by data mining side According to each EuOutput valve, calculate the output result of world model.Fig. 3, which shows, has handled a kind of data based on shared data modeling Prediction technique, this method may comprise steps of:
S201 obtains the ciphertext data of U data providing offer, U >=2 respectively;
The ciphertext data of each data providing are respectively corresponded the credible performing environment E of input data provider by S2021、 E2…EU
S203 obtains the output valve O of credible performing environment1、O2…OU
Predicted value Y=h [z (O is calculated according to prediction model trained in advance in S2041,O2,…OU)];
Comparison diagram 2 and Fig. 3 can be seen that in model service stage, still use close copy training stage system architecture, Difference place is not needing to carry out parameter iteration update, i.e., according to the input data, disposable to export prediction result value y.
Accordingly for any credible performing environment Eu, partial model parameter WuIt has been pre-saved that, in model service stage, Its 2 kinds of basic function of internal realization:
1) the ciphertext data of input are decrypted, obtain clear data characteristic value Xu
2) according to current partial model parameter Wu, calculate XuCorresponding output valve Ou
In model service stage, the specific implementation of each step can be found in the correspondence step in model training stage, this reality Example is applied to be not repeated to illustrate.
In the above embodiments, 2 above data providers are described, the reality of data aggregate training pattern is provided jointly Existing scheme, it is to be understood that other improvements can also be done, on the basis of above scheme to meet answering for some special scenes With demand, it is exemplified below:
When only 1 data providing provides data and there are privacy requirements to data to data mining side, can use The training of following scheme implementation model:
Data mining side is iterated training using following steps, until meeting model training requirement:
S101 ' obtains the ciphertext data that 1 data providing provides;
The ciphertext data of data providing are inputted the credible performing environment E of the data providing by S102 ';
S103 ' obtains the output valve O of credible performing environment E;
S104 ' is according to given training objective model, the deviation Δ of computation model predicted value and true value;
Deviation Δ is back to credible performing environment E by S105 ', so that the credible performing environment updates model parameter;
The present embodiment, which can be adapted for some data providing commission data mining side, to carry out data mining and is not intended to To the application scenarios of data mining side's leak data details.
Compared with S101~S105, above-mentioned S101 '~S105 ' is that U data providing is reduced to 1 data providing Situation, other implementations are almost the same, are not repeated to illustrate in the present embodiment.Wherein, in credible performing environment E Portion, still realization data deciphering, output valve calculate, parameter updates three kinds of functions.
When there are multiple data providings to data mining side's offer data and wherein to have partial data provider to data When there is no privacy requirements, the training of following scheme implementation model can use:
Data mining side is iterated training using following steps, until meeting model training requirement:
S101 " obtains the data of U data providing (wherein U >=2) offer respectively, wherein at least one data providing The data mode of offer is ciphertext data, and the data mode that other data providings provide is clear data;
If the data mode that S102 " data providing u is provided is ciphertext data, ciphertext data are corresponded to input data should The credible performing environment E of provideru, u here refers in particular to the data providing of data confidentiality demand.
S103 " obtains the output valve O of each credible performing environmentu
S104 " utilizes each credible performing environment EuOutput valve OuAnd the clear data that other data providings provide, The deviation Δ of computation model predicted value and true value;
The difference of this step and S104 are: for providing the provider of clear data, data mining side can directly be obtained It gets corresponding clear data and participates in global calculation, need not move through credible performing environment.
The deviation is back to each credible performing environment by S105 " respectively, so that each credible performing environment updates respectively Partial model parameter;
The difference of this step and S105 are: for providing the provider of clear data, data mining side can directly be born Duty is safeguarded and updates partial model parameter.
Compared with S101~S105, above-mentioned S101 "~S105 " is that U data providing is divided into two classes: for not counting According to the data providing of privacy requirements, the clear data that data mining side can directly acquire its offer carries out model training;And For there is the data providing of data confidentiality demand, the ciphertext data provided still need to be handled by credible performing environment. Wherein, inside credible performing environment, still realization data deciphering, output valve are calculated, parameter updates three kinds of functions.
The scheme of the present embodiment does not have the scene of privacy requirements suitable for certain data characteristicses needed for world model.When So, from the perspective of data-privacy, " do not have privacy requirements " here be not generally in absolute sense, but in data There is no privacy requirements inside excavation side.Such as certain data providing and data mining side have depth cooperation relationship or data Also there are a data can be used to participate in world model's training (it is considered that data mining side oneself is exactly to count for excavation side oneself According to one of provider), then for data mining side, these do not have the data of privacy requirements can be without credible execution ring Border and directly use.
Below with reference to specific example, the scheme of this specification embodiment is illustrated;
Assuming that whole training demand is: the user's asset data provided according to Liang Jia banking institution, establishing one, " prediction is used The model of the whether capable great number of the repayment on schedule loan in family ".
The data characteristics that bank 1 can provide is x1,x2,x3
The data characteristics that bank 2 can provide is x4,x5
Holistic modeling uses logistical regression model, functional form are as follows:
Wherein:
Z=(w1x1+w2x2+w3x3+w4x4+w5x5) (2)
w1,w2,w3For the local parameter of bank 1, w4,w5, it is the local parameter of bank 2.
Definition:
Sum1=w1x1+w2x2+w3x3 (3)
Sum2=w4x4+w5x5 (4)
Then according to formula (1)~(4), the deviation of available world model calculates function:
Credible performing environment realizes that the credible performing environment of creation is known as enclave using the SGX technology of Intel, has For body, this mode be the safety operation of legal software is encapsulated in an enclave, once software and data are located at In enclave, even if operating system or can not also influence the code and data inside enclave with VMM (Hypervisor). The security boundary of enclave only includes CPU and own.
Implementation model training system overall architecture as shown in fig. 4 a, separately below from data providing and data mining side Angle, the implementation of system is illustrated:
1) data providing:
Every bank respectively encrypts the data for being provided to data mining side, can store after data encryption to number According to the hard disk of provider.Certainly, according to practical application request, certain parts in data can also be provided with clear-text way.
Every bank provide respectively enclave define file (.edl) and its corresponding dynamic link library (.dll or .so), the enclave of output includes following function or interface:
1.1) the ciphertext data encrypted in advance to the externally input bank of enclave are decrypted, and obtain clear data.Often Secondary iteration inputs N ciphertext data, and the corresponding user of every data, for any user i, the clear data of bank 1 is xi1, xi2,xi3, the clear data of bank 2 is xi4,xi5
1.2) it according to current local parameter values, calculates separately the output valve of pieces of data and exports to enclave. For any user i, the output valve of bank 1 is sum1i, the output valve of bank 2 is sum2i
1.3) according to the Δ returned outside enclaveiValue updates local parameter, updates and uses gradient descent method, every time repeatedly Operation is both participated in for all N datas, more new formula is as follows:
W←W-α∑iΔiXi (6)
That is:
w1←w1-α∑iΔixi1
w2←w2-α∑iΔixi2
w3←w3-α∑iΔixi3
w4←w4-α∑iΔixi4
w5←w5-α∑iΔixi5
Wherein
α is preset learning rate, and bank 1 and bank 2 can be the same or different using α.
2) data mining side:
Data mining side global label value Y unified first, Y value is for indicating: there is the user of great number behavior of lending, Whether can repay the loan on schedule.The information can be obtained from Liang Jia bank, can also be obtained from other lending agencies.
The enclave information that load Liang Jia bank provides respectively, creates enclave1 and enclave2, is based on enclave1 Model training application is established with enclave2, the operating mechanism of the application is as follows:
2.1) iteration reads a collection of ciphertext data from hard disk every time, it is assumed that reading quantity every time is N.Identity card can be passed through Number is associated reading to two bank datas.By the ciphertext data input enclave1 of bank 1, the ciphertext data of bank 2 Input enclave2.
2.2) inside enclave1 and enclave2, ciphertext data are decrypted respectively, are joined according to current part Number (uses initial parameter value) for the first time when iteration, calculate separately to obtain sum1 using formula (3) and formula (4)iAnd sum2iAnd it exports extremely It is external.2.3) sum1 exported according to enclave1 and enclave2iAnd sum2i, Δ is calculated using formula (7)i, and by ΔiRespectively Return to enclave1 and enclave2;
2.4) it inside enclave1 and enclave2, is utilized respectively formula (6) and parameter is updated.
It repeats the above iteration and obtains final parameter value w until meeting model training condition1,w2,w3,w4,w5, by these It is worth substitution formula (1) and formula (2) to get the model trained to needs.
Fig. 4 b shows the system overall architecture of another implementation model training, and corresponding whole training demand is: number The assets for possessing some user's asset datas according to excavation side oneself and needing to be provided according to one's own data and bank 1 Data establish the conjunctive model of one " whether prediction user has the ability to repay great number loan on schedule ", in which:
The data characteristics that bank 1 can provide is x1,x2,x3;Corresponding local parameter is w1,w2,w3
The one's own data characteristics of data providing is x4,x5;Corresponding local parameter is w4,w5
Compared with a upper embodiment, whole model training thinking is almost the same, and difference is pointed out to be only that: only for bank 1 creation enclave, for feature x4,x5For, data providing oneself can be read directly clear data and participate in model training It calculates.
Corresponding to above method embodiment, this specification embodiment also provides a kind of model training dress based on shared data It sets, shown in Figure 5, the apparatus may include below for realizing the module of repetitive exercise:
Data obtaining module 110, for obtaining the ciphertext data of at least one data providing offer respectively;
Data input module 120, for the ciphertext data of each data providing to be respectively corresponded input data provider's Credible performing environment;
Output valve obtains module 130, and for obtaining the output valve of each credible performing environment, the output valve is according to Ciphertext data are calculated;
Deviation computing module 140, the training objective model given for basis, computation model predicted value and true value Deviation;The model predication value determines that the true value is to mention according to each data according to the output valve of each credible performing environment Overall situation label value determined by the data of supplier;
Deviation return module 150, for the deviation to be back to each credible performing environment respectively, so that each credible Performing environment updates partial model parameter respectively;
Wherein, include: inside any credible performing environment
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for it is corresponding to calculate clear data characteristic value according to current partial model parameter Output valve;
Parameter updates submodule, for the deviation according to return, updates partial model parameter.
In a kind of specific embodiment that this specification provides, mentioned when there are multiple data providings to data mining side For data and when wherein having partial data provider not have privacy requirements to data, then the functions of modules of above-mentioned apparatus can match It sets as follows:
Data obtaining module 110, the data provided for obtaining multiple data providings respectively, wherein at least one data The data mode that provider provides is ciphertext data, and the data mode that other data providings provide is clear data;
Data input module 120 will be close in the case that the data mode for providing in data providing is ciphertext data Literary data correspond to the credible performing environment of the input data provider;
Output valve obtains module 130, and for obtaining the output valve of each credible performing environment, the output valve is according to Ciphertext data are calculated;
Deviation computing module 140, for being mentioned using the output valve and other data providings of each credible performing environment The clear data of confession, the deviation of computation model predicted value and true value;The model predication value is according to each credible performing environment Output valve is determining and the characteristic value of clear data determines;The true value is is determined according to the data of each data providing Global label value;
Deviation return module 150, for the deviation to be back to each credible performing environment respectively, so that each credible Performing environment updates partial model parameter respectively;
Wherein, include: inside any credible performing environment
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for it is corresponding to calculate clear data characteristic value according to current partial model parameter Output valve;
Parameter updates submodule, for the deviation according to return, updates partial model parameter.
Shown in Figure 6, this specification embodiment also provides a kind of data prediction meanss based on shared data modeling, should Device may include:
Data obtaining module 210, for obtaining the ciphertext data of at least one data providing offer respectively;
Data input module 220, for the ciphertext data of each data providing to be respectively corresponded input data provider's Credible performing environment;
Output valve obtains module 230, and for obtaining the output valve of each credible performing environment, the output valve is according to Ciphertext data are calculated;
Predictor calculation module 240, the prediction model trained in advance for the output valve input by each credible performing environment, Predicted value is calculated;
Wherein, any credible performing environment EuInside includes:
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for it is corresponding to calculate clear data characteristic value according to current partial model parameter Output valve.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in On reservoir and the computer program that can run on a processor, wherein processor may be implemented above-mentioned when executing described program Model training method or data predication method.
Fig. 7 shows one kind provided by this specification embodiment and more specifically calculates device hardware structural schematic diagram, The equipment may include: processor 1010, memory 1020, input/output interface 1030, communication interface 1040 and bus 1050.Wherein processor 1010, memory 1020, input/output interface 1030 and communication interface 1040 are real by bus 1050 The now communication connection inside equipment each other.
Processor 1010 can use general CPU (Central Processing Unit, central processing unit), micro- place Reason device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or one Or the modes such as multiple integrated circuits are realized, for executing relative program, to realize technical side provided by this specification embodiment Case.
Memory 1020 can use ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, the forms such as dynamic memory realize.Memory 1020 can store Operating system and other applications are realizing technical side provided by the embodiment of this specification by software or firmware When case, relevant program code is stored in memory 1020, and execution is called by processor 1010.
Input/output interface 1030 is for connecting input/output module, to realize information input and output.Input and output/ Module can be used as component Configuration (not shown) in a device, can also be external in equipment to provide corresponding function.Wherein Input equipment may include keyboard, mouse, touch screen, microphone, various kinds of sensors etc., output equipment may include display, Loudspeaker, vibrator, indicator light etc..
Communication interface 1040 is used for connection communication module (not shown), to realize the communication of this equipment and other equipment Interaction.Wherein communication module can be realized by wired mode (such as USB, cable etc.) and be communicated, can also be wirelessly (such as mobile network, WIFI, bluetooth etc.) realizes communication.
Bus 1050 include an access, equipment various components (such as processor 1010, memory 1020, input/it is defeated Outgoing interface 1030 and communication interface 1040) between transmit information.
It should be noted that although above equipment illustrates only processor 1010, memory 1020, input/output interface 1030, communication interface 1040 and bus 1050, but in the specific implementation process, which can also include realizing normal fortune Other assemblies necessary to row.In addition, it will be appreciated by those skilled in the art that, it can also be only comprising real in above equipment Component necessary to existing this specification example scheme, without including all components shown in figure.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey Foregoing model training method or data predication method are realized when sequence is executed by processor:
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification Embodiment can be realized by means of software and necessary general hardware platform.Based on this understanding, this specification is implemented Substantially the part that contributes to existing technology can be embodied in the form of software products the technical solution of example in other words, The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are to make It is each to obtain computer equipment (can be personal computer, server or the network equipment etc.) execution this specification embodiment Method described in certain parts of a embodiment or embodiment.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment The combination of any several equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.The apparatus embodiments described above are merely exemplary, wherein described be used as separate part description Module may or may not be physically separated, can be each module when implementing this specification example scheme Function realize in the same or multiple software and or hardware.Can also select according to the actual needs part therein or Person's whole module achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not the case where making the creative labor Under, it can it understands and implements.
The above is only the specific embodiment of this specification embodiment, it is noted that for the general of the art For logical technical staff, under the premise of not departing from this specification embodiment principle, several improvements and modifications can also be made, this A little improvements and modifications also should be regarded as the protection scope of this specification embodiment.

Claims (42)

1. a kind of model training method based on shared data, this method comprises:
It is iterated training using following steps, until meeting model training requirement:
The ciphertext data that at least one data providing provides are obtained respectively;
The ciphertext data of each data providing are respectively corresponded to the credible performing environment of input data provider;
The output valve of each credible performing environment is obtained, the output valve is calculated according to the ciphertext data;
According to given training objective model, the deviation of computation model predicted value and true value;The model predication value according to The output valve of each credible performing environment determines that the true value is the overall situation label according to determined by the data of each data providing Value;
The deviation is back to each credible performing environment respectively, so that each credible performing environment updates partial model ginseng respectively Number;
Wherein, following steps are executed inside any credible performing environment:
The ciphertext data of input are decrypted, clear data characteristic value is obtained;
According to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated;
According to the deviation of return, partial model parameter is updated.
2. according to the method described in claim 1, the partial model parameter that the basis is current, calculates clear data characteristic value pair The output valve answered, comprising:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
3. according to the method described in claim 1, the training objective model is logistic regression model.
4. according to the method described in claim 1, the credible performing environment are as follows: created using software protection extension SGX technology Enclave.
5. according to the method described in claim 1, the deviation according to return, updates partial model parameter, comprising:
According to the deviation of return, partial model parameter is updated using gradient descent method;Or
According to the deviation of return, partial model parameter is updated using stochastic gradient descent method.
6. according to the method described in claim 1, the overall situation label value, the label value provided according to a data providing are true It is fixed, or determined jointly according to the label value that multiple data providings provide.
7. a kind of model training method based on shared data, this method comprises:
The ciphertext data of at least 1 data providing are respectively corresponded to the credible performing environment of input data provider;Respectively may be used Believe in performing environment, the ciphertext data of input are decrypted respectively, obtain each clear data characteristic value;
It is iterated training using following steps, until meeting model training requirement:
In each credible performing environment, according to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated;
According to given training objective model, the deviation of computation model predicted value and true value;The model predication value according to The output valve of each credible performing environment determines that the true value is the overall situation label according to determined by the data of each data providing Value;
The deviation is back to each credible performing environment respectively;
In each credible performing environment, partial model parameter is updated according to the deviation of return.
8. according to the method described in claim 7, the partial model parameter that the basis is current, calculates clear data characteristic value pair The output valve answered, comprising:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
9. according to the method described in claim 7, the training objective model is logistic regression model.
10. according to the method described in claim 7, the credible performing environment are as follows: created using software protection extension SGX technology Enclave.
11. according to the method described in claim 7, the deviation according to return, updates partial model parameter, comprising:
According to the deviation of return, partial model parameter is updated using gradient descent method;Or
According to the deviation of return, partial model parameter is updated using stochastic gradient descent method.
12. according to the method described in claim 7, the overall situation label value, the label value provided according to a data providing It determines, or is determined jointly according to the label value that multiple data providings provide.
13. a kind of model training method based on shared data, this method comprises:
It is iterated training using following steps, until meeting model training requirement:
The data that multiple data providings provide are obtained respectively, wherein the data mode that at least one data providing provides is close Literary data, the data mode that other data providings provide are clear data;
If the data mode that data providing provides is ciphertext data, by ciphertext data correspond to the input data provider can Believe performing environment;
The output valve of each credible performing environment is obtained, the output valve is calculated according to the ciphertext data;
The clear data provided using the output valve of each credible performing environment and other data providings, computation model prediction The deviation of value and true value;The model predication value is according to the output valve of each credible performing environment is determining and clear data Characteristic value determines;The true value is the overall situation label value according to determined by the data of each data providing;
The deviation is back to each credible performing environment respectively, so that each credible performing environment updates partial model ginseng respectively Number;
Wherein, following steps are executed inside any credible performing environment:
The ciphertext data of input are decrypted, clear data characteristic value is obtained;
According to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated;
According to the deviation of return, partial model parameter is updated.
14. according to the method for claim 13, the current partial model parameter of the basis calculates clear data characteristic value Corresponding output valve, comprising:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
15. according to the method for claim 13, the training objective model is logistic regression model.
16. according to the method for claim 13, the credible performing environment are as follows: created using software protection extension SGX technology The enclave built.
17. according to the method for claim 13, the deviation according to return updates partial model parameter, comprising:
According to the deviation of return, partial model parameter is updated using gradient descent method;Or
According to the deviation of return, partial model parameter is updated using stochastic gradient descent method.
18. according to the method for claim 13, the overall situation label value, the label value provided according to a data providing It determines, or is determined jointly according to the label value that multiple data providings provide.
19. a kind of data predication method based on shared data modeling, this method comprises:
The ciphertext data that at least one data providing provides are obtained respectively;
The ciphertext data of each data providing are respectively corresponded to the credible performing environment of input data provider;
The output valve of each credible performing environment is obtained, the output valve is calculated according to the ciphertext data;
By the output valve input of each credible performing environment prediction model trained in advance, predicted value is calculated;
Wherein, following steps are executed inside any credible performing environment:
The ciphertext data of input are decrypted, clear data characteristic value is obtained;
According to current partial model parameter, the corresponding output valve of clear data characteristic value is calculated.
20. according to the method for claim 19, the current partial model parameter of the basis calculates clear data characteristic value Corresponding output valve, comprising:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
21. according to the method for claim 19, the training objective model is logistic regression model.
22. according to the method for claim 19, the credible performing environment are as follows: created using software protection extension SGX technology The enclave built.
23. a kind of model training apparatus based on shared data, which includes the following module for realizing repetitive exercise:
Data obtaining module, for obtaining the ciphertext data of at least one data providing offer respectively;
Data input module, for the ciphertext data of each data providing to be respectively corresponded to the credible execution of input data provider Environment;
Output valve obtains module, and for obtaining the output valve of each credible performing environment, the output valve is according to the ciphertext number According to being calculated;
Deviation computing module, for according to given training objective model, the deviation of computation model predicted value and true value; The model predication value determines that the true value is the number according to each data providing according to the output valve of each credible performing environment According to identified global label value;
Deviation return module, for the deviation to be back to each credible performing environment respectively, so that each credible execution ring Border updates partial model parameter respectively;
Wherein, include: inside any credible performing environment
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for calculating the corresponding output of clear data characteristic value according to current partial model parameter Value;
Parameter updates submodule, for the deviation according to return, updates partial model parameter.
24. device according to claim 23, the output valve computing module, are specifically used for:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
25. device according to claim 23, the training objective model is logistic regression model.
26. device according to claim 23, the credible performing environment are as follows: created using software protection extension SGX technology The enclave built.
27. device according to claim 23, the parameter updates submodule, is specifically used for:
According to the deviation of return, partial model parameter is updated using gradient descent method;Or
According to the deviation of return, partial model parameter is updated using stochastic gradient descent method.
28. device according to claim 23, the overall situation label value, the label value provided according to a data providing It determines, or is determined jointly according to the label value that multiple data providings provide.
29. a kind of model training apparatus based on shared data, which includes the following module for realizing repetitive exercise:
Data obtaining module, the data provided for obtaining multiple data providings respectively, wherein at least one data providing The data mode of offer is ciphertext data, and the data mode that other data providings provide is clear data;
Data input module corresponds to ciphertext data defeated if the data mode for data providing to provide is ciphertext data Enter the credible performing environment of the data provider;
Output valve obtains module, and for obtaining the output valve of each credible performing environment, the output valve is according to the ciphertext number According to being calculated;
Deviation computing module, output valve and other data providings for utilizing each credible performing environment provide bright Literary data, the deviation of computation model predicted value and true value;The model predication value is according to the output of each credible performing environment The characteristic value of value determination and clear data determines;The true value is the overall situation according to determined by the data of each data providing Label value;
Deviation return module, for the deviation to be back to each credible performing environment respectively, so that each credible execution ring Border updates partial model parameter respectively;
Wherein, include: inside any credible performing environment
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for calculating the corresponding output of clear data characteristic value according to current partial model parameter Value;
Parameter updates submodule, for the deviation according to return, updates partial model parameter.
30. device according to claim 29, the output valve computing module, are specifically used for:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
31. device according to claim 29, the training objective model is logistic regression model.
32. device according to claim 29, the credible performing environment are as follows: created using software protection extension SGX technology The enclave built.
33. device according to claim 29, the parameter updates submodule, is specifically used for:
According to the deviation of return, partial model parameter is updated using gradient descent method;Or
According to the deviation of return, partial model parameter is updated using stochastic gradient descent method.
34. device according to claim 29, the overall situation label value, the label value provided according to a data providing It determines, or is determined jointly according to the label value that multiple data providings provide.
35. a kind of data prediction meanss based on shared data modeling, the device include:
Data obtaining module, for obtaining the ciphertext data of at least one data providing offer respectively;
Data input module, for the ciphertext data of each data providing to be respectively corresponded to the credible execution of input data provider Environment;
Output valve obtains module, and for obtaining the output valve of each credible performing environment, the output valve is according to the ciphertext number According to being calculated;
Predictor calculation module, the prediction model trained in advance for the output valve input by each credible performing environment, calculates To predicted value;
Wherein, any credible performing environment EuInside includes:
It decrypts submodule and obtains clear data characteristic value for the ciphertext data of input to be decrypted;
Output valve computational submodule, for calculating the corresponding output of clear data characteristic value according to current partial model parameter Value.
36. device according to claim 35, the output valve obtains module, is specifically used for:
According to current partial model parameter Wu=(w1,w2,…)u, calculate the weighted sum O of the characteristic value of clear datau=(w1x1+ w2x2+…)u
37. device according to claim 35, the training objective model is logistic regression model.
38. device according to claim 35, the credible performing environment are as follows: created using software protection extension SGX technology The enclave built.
39. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor realizes such as method as claimed in any one of claims 1 to 6 when executing described program.
40. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor realizes such as the described in any item methods of claim 7 to 12 when executing described program.
41. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor realizes such as the described in any item methods of claim 13 to 18 when executing described program.
42. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor realizes such as the described in any item methods of claim 19 to 22 when executing described program.
CN201710632357.5A 2017-07-28 2017-07-28 Model training method and device based on shared data Active CN109308418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710632357.5A CN109308418B (en) 2017-07-28 2017-07-28 Model training method and device based on shared data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710632357.5A CN109308418B (en) 2017-07-28 2017-07-28 Model training method and device based on shared data

Publications (2)

Publication Number Publication Date
CN109308418A true CN109308418A (en) 2019-02-05
CN109308418B CN109308418B (en) 2021-09-24

Family

ID=65205429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710632357.5A Active CN109308418B (en) 2017-07-28 2017-07-28 Model training method and device based on shared data

Country Status (1)

Country Link
CN (1) CN109308418B (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162981A (en) * 2019-04-18 2019-08-23 阿里巴巴集团控股有限公司 Data processing method and device
CN110162995A (en) * 2019-04-22 2019-08-23 阿里巴巴集团控股有限公司 Assess the method and device thereof of contribution data degree
CN110543776A (en) * 2019-08-30 2019-12-06 联想(北京)有限公司 model processing method, model processing device, electronic equipment and medium
CN110569663A (en) * 2019-08-15 2019-12-13 深圳市莱法照明通信科技有限公司 Method, device, system and storage medium for educational data sharing
CN110569228A (en) * 2019-08-09 2019-12-13 阿里巴巴集团控股有限公司 model parameter determination method and device and electronic equipment
CN110674528A (en) * 2019-09-20 2020-01-10 深圳前海微众银行股份有限公司 Federal learning privacy data processing method, device, system and storage medium
CN110942147A (en) * 2019-11-28 2020-03-31 支付宝(杭州)信息技术有限公司 Neural network model training and predicting method and device based on multi-party safety calculation
CN110955915A (en) * 2019-12-14 2020-04-03 支付宝(杭州)信息技术有限公司 Method and device for processing private data
CN111027632A (en) * 2019-12-13 2020-04-17 支付宝(杭州)信息技术有限公司 Model training method, device and equipment
CN111027870A (en) * 2019-12-14 2020-04-17 支付宝(杭州)信息技术有限公司 User risk assessment method and device, electronic equipment and storage medium
CN111079152A (en) * 2019-12-13 2020-04-28 支付宝(杭州)信息技术有限公司 Model deployment method, device and equipment
CN111079153A (en) * 2019-12-17 2020-04-28 支付宝(杭州)信息技术有限公司 Security modeling method and device, electronic equipment and storage medium
CN111079947A (en) * 2019-12-20 2020-04-28 支付宝(杭州)信息技术有限公司 Method and system for model training based on optional private data
CN111079182A (en) * 2019-12-18 2020-04-28 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium
CN111126628A (en) * 2019-11-21 2020-05-08 支付宝(杭州)信息技术有限公司 Method, device and equipment for training GBDT model in trusted execution environment
CN111125735A (en) * 2019-12-20 2020-05-08 支付宝(杭州)信息技术有限公司 Method and system for model training based on private data
CN111291401A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Privacy protection-based business prediction model training method and device
CN111460528A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Multi-party combined training method and system based on Adam optimization algorithm
CN111582496A (en) * 2020-04-26 2020-08-25 暨南大学 Safe and efficient deep learning model prediction system and method based on SGX
CN111612167A (en) * 2019-02-26 2020-09-01 京东数字科技控股有限公司 Joint training method, device, equipment and storage medium of machine learning model
US10803184B2 (en) 2019-08-09 2020-10-13 Alibaba Group Holding Limited Generation of a model parameter
WO2020211240A1 (en) * 2019-04-19 2020-10-22 平安科技(深圳)有限公司 Joint construction method and apparatus for prediction model, and computer device
CN111935179A (en) * 2020-09-23 2020-11-13 支付宝(杭州)信息技术有限公司 Model training method and device based on trusted execution environment
US10846413B2 (en) 2019-04-18 2020-11-24 Advanced New Technologies Co., Ltd. Data processing method and device
CN112417485A (en) * 2020-11-30 2021-02-26 支付宝(杭州)信息技术有限公司 Model training method, system and device based on trusted execution environment
WO2021079299A1 (en) * 2019-10-24 2021-04-29 International Business Machines Corporation Private transfer learning
WO2021082647A1 (en) * 2019-10-29 2021-05-06 华为技术有限公司 Federated learning system, training result aggregation method, and device
WO2021114974A1 (en) * 2019-12-14 2021-06-17 支付宝(杭州)信息技术有限公司 User risk assessment method and apparatus, electronic device, and storage medium
WO2021143466A1 (en) * 2020-01-13 2021-07-22 支付宝(杭州)信息技术有限公司 Method and device for using trusted execution environment to train neural network model
CN113268727A (en) * 2021-07-19 2021-08-17 天聚地合(苏州)数据股份有限公司 Joint training model method, device and computer readable storage medium
WO2021159684A1 (en) * 2020-02-14 2021-08-19 云从科技集团股份有限公司 Data processing method, system and platform, and device and machine-readable medium
US20210312017A1 (en) * 2020-10-30 2021-10-07 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus and electronic device for processing user request and storage medium
WO2021228230A1 (en) * 2020-05-15 2021-11-18 支付宝(杭州)信息技术有限公司 Data verification method and apparatus based on secure execution environment
CN114548255A (en) * 2022-02-17 2022-05-27 支付宝(杭州)信息技术有限公司 Model training method, device and equipment
WO2022174787A1 (en) * 2021-02-22 2022-08-25 支付宝(杭州)信息技术有限公司 Model training

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732247A (en) * 2015-03-09 2015-06-24 北京工业大学 Human face feature positioning method
CN105261358A (en) * 2014-07-17 2016-01-20 中国科学院声学研究所 N-gram grammar model constructing method for voice identification and voice identification system
CN105989441A (en) * 2015-02-11 2016-10-05 阿里巴巴集团控股有限公司 Model parameter adjustment method and device
CN106664563A (en) * 2014-08-29 2017-05-10 英特尔公司 Pairing computing devices according to a multi-level security protocol

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105261358A (en) * 2014-07-17 2016-01-20 中国科学院声学研究所 N-gram grammar model constructing method for voice identification and voice identification system
CN106664563A (en) * 2014-08-29 2017-05-10 英特尔公司 Pairing computing devices according to a multi-level security protocol
CN105989441A (en) * 2015-02-11 2016-10-05 阿里巴巴集团控股有限公司 Model parameter adjustment method and device
CN104732247A (en) * 2015-03-09 2015-06-24 北京工业大学 Human face feature positioning method

Cited By (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612167A (en) * 2019-02-26 2020-09-01 京东数字科技控股有限公司 Joint training method, device, equipment and storage medium of machine learning model
CN111612167B (en) * 2019-02-26 2024-04-16 京东科技控股股份有限公司 Combined training method, device, equipment and storage medium of machine learning model
US10846413B2 (en) 2019-04-18 2020-11-24 Advanced New Technologies Co., Ltd. Data processing method and device
US11281784B2 (en) 2019-04-18 2022-03-22 Advanced New Technologies Co., Ltd. Data processing method and device
CN110162981A (en) * 2019-04-18 2019-08-23 阿里巴巴集团控股有限公司 Data processing method and device
US11074352B2 (en) 2019-04-18 2021-07-27 Advanced New Technologies Co., Ltd. Data processing method and device
WO2020211500A1 (en) * 2019-04-18 2020-10-22 创新先进技术有限公司 Data processing method and device
WO2020211240A1 (en) * 2019-04-19 2020-10-22 平安科技(深圳)有限公司 Joint construction method and apparatus for prediction model, and computer device
CN110162995B (en) * 2019-04-22 2023-01-10 创新先进技术有限公司 Method and device for evaluating data contribution degree
CN110162995A (en) * 2019-04-22 2019-08-23 阿里巴巴集团控股有限公司 Assess the method and device thereof of contribution data degree
US10803184B2 (en) 2019-08-09 2020-10-13 Alibaba Group Holding Limited Generation of a model parameter
CN110569228A (en) * 2019-08-09 2019-12-13 阿里巴巴集团控股有限公司 model parameter determination method and device and electronic equipment
CN110569663A (en) * 2019-08-15 2019-12-13 深圳市莱法照明通信科技有限公司 Method, device, system and storage medium for educational data sharing
CN110543776A (en) * 2019-08-30 2019-12-06 联想(北京)有限公司 model processing method, model processing device, electronic equipment and medium
WO2021051629A1 (en) * 2019-09-20 2021-03-25 深圳前海微众银行股份有限公司 Federated learning privacy data processing method and system, device, and storage medium
CN110674528A (en) * 2019-09-20 2020-01-10 深圳前海微众银行股份有限公司 Federal learning privacy data processing method, device, system and storage medium
CN110674528B (en) * 2019-09-20 2024-04-09 深圳前海微众银行股份有限公司 Federal learning privacy data processing method, device, system and storage medium
GB2604804B (en) * 2019-10-24 2024-04-03 Ibm Private transfer learning
GB2604804A (en) * 2019-10-24 2022-09-14 Ibm Private transfer learning
US11676011B2 (en) 2019-10-24 2023-06-13 International Business Machines Corporation Private transfer learning
WO2021079299A1 (en) * 2019-10-24 2021-04-29 International Business Machines Corporation Private transfer learning
WO2021082647A1 (en) * 2019-10-29 2021-05-06 华为技术有限公司 Federated learning system, training result aggregation method, and device
CN111126628A (en) * 2019-11-21 2020-05-08 支付宝(杭州)信息技术有限公司 Method, device and equipment for training GBDT model in trusted execution environment
WO2021098385A1 (en) * 2019-11-21 2021-05-27 支付宝(杭州)信息技术有限公司 Method and apparatus for training gbdt model in trusted execution environment, and device
CN110942147A (en) * 2019-11-28 2020-03-31 支付宝(杭州)信息技术有限公司 Neural network model training and predicting method and device based on multi-party safety calculation
CN111079152A (en) * 2019-12-13 2020-04-28 支付宝(杭州)信息技术有限公司 Model deployment method, device and equipment
CN111079152B (en) * 2019-12-13 2022-07-22 支付宝(杭州)信息技术有限公司 Model deployment method, device and equipment
CN111027632A (en) * 2019-12-13 2020-04-17 支付宝(杭州)信息技术有限公司 Model training method, device and equipment
CN111027632B (en) * 2019-12-13 2023-04-25 蚂蚁金服(杭州)网络技术有限公司 Model training method, device and equipment
CN110955915A (en) * 2019-12-14 2020-04-03 支付宝(杭州)信息技术有限公司 Method and device for processing private data
CN111027870A (en) * 2019-12-14 2020-04-17 支付宝(杭州)信息技术有限公司 User risk assessment method and device, electronic equipment and storage medium
WO2021114974A1 (en) * 2019-12-14 2021-06-17 支付宝(杭州)信息技术有限公司 User risk assessment method and apparatus, electronic device, and storage medium
WO2021114911A1 (en) * 2019-12-14 2021-06-17 支付宝(杭州)信息技术有限公司 User risk assessment method and apparatus, electronic device, and storage medium
CN111079153B (en) * 2019-12-17 2022-06-03 支付宝(杭州)信息技术有限公司 Security modeling method and device, electronic equipment and storage medium
CN111079153A (en) * 2019-12-17 2020-04-28 支付宝(杭州)信息技术有限公司 Security modeling method and device, electronic equipment and storage medium
CN111079182A (en) * 2019-12-18 2020-04-28 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium
CN111125735A (en) * 2019-12-20 2020-05-08 支付宝(杭州)信息技术有限公司 Method and system for model training based on private data
CN111079947A (en) * 2019-12-20 2020-04-28 支付宝(杭州)信息技术有限公司 Method and system for model training based on optional private data
CN111079947B (en) * 2019-12-20 2022-05-17 支付宝(杭州)信息技术有限公司 Method and system for model training based on optional private data
WO2021143466A1 (en) * 2020-01-13 2021-07-22 支付宝(杭州)信息技术有限公司 Method and device for using trusted execution environment to train neural network model
WO2021159684A1 (en) * 2020-02-14 2021-08-19 云从科技集团股份有限公司 Data processing method, system and platform, and device and machine-readable medium
CN111460528A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Multi-party combined training method and system based on Adam optimization algorithm
CN111582496B (en) * 2020-04-26 2023-05-30 暨南大学 SGX-based safe and efficient deep learning model prediction system and method
CN111582496A (en) * 2020-04-26 2020-08-25 暨南大学 Safe and efficient deep learning model prediction system and method based on SGX
CN111291401A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Privacy protection-based business prediction model training method and device
CN112487460B (en) * 2020-05-09 2022-04-12 支付宝(杭州)信息技术有限公司 Privacy protection-based business prediction model training method and device
CN112487460A (en) * 2020-05-09 2021-03-12 支付宝(杭州)信息技术有限公司 Privacy protection-based business prediction model training method and device
WO2021228230A1 (en) * 2020-05-15 2021-11-18 支付宝(杭州)信息技术有限公司 Data verification method and apparatus based on secure execution environment
CN111935179A (en) * 2020-09-23 2020-11-13 支付宝(杭州)信息技术有限公司 Model training method and device based on trusted execution environment
US11500992B2 (en) 2020-09-23 2022-11-15 Alipay (Hangzhou) Information Technology Co., Ltd. Trusted execution environment-based model training methods and apparatuses
JP7223067B2 (en) 2020-10-30 2023-02-15 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Methods, apparatus, electronics, computer readable storage media and computer programs for processing user requests
JP2022006164A (en) * 2020-10-30 2022-01-12 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Method, device, electronic device, computer-readable storage media and computer program for processing user request
EP3869374A3 (en) * 2020-10-30 2022-01-05 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus and electronic device for processing user request and storage medium
US20210312017A1 (en) * 2020-10-30 2021-10-07 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus and electronic device for processing user request and storage medium
CN112417485A (en) * 2020-11-30 2021-02-26 支付宝(杭州)信息技术有限公司 Model training method, system and device based on trusted execution environment
CN112417485B (en) * 2020-11-30 2024-02-02 支付宝(杭州)信息技术有限公司 Model training method, system and device based on trusted execution environment
WO2022174787A1 (en) * 2021-02-22 2022-08-25 支付宝(杭州)信息技术有限公司 Model training
CN113268727A (en) * 2021-07-19 2021-08-17 天聚地合(苏州)数据股份有限公司 Joint training model method, device and computer readable storage medium
CN114548255A (en) * 2022-02-17 2022-05-27 支付宝(杭州)信息技术有限公司 Model training method, device and equipment

Also Published As

Publication number Publication date
CN109308418B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN109308418A (en) A kind of model training method and device based on shared data
US20210004718A1 (en) Method and device for training a model based on federated learning
CN107704930B (en) Modeling method, device and system based on shared data and electronic equipment
EP3659292B1 (en) Secure multi-party computation with no trusted initializer
CN111756754B (en) Method and device for training model
CN109033854A (en) Prediction technique and device based on model
CN110427969B (en) Data processing method and device and electronic equipment
CN105892991A (en) Modular multiplication using look-up tables
CN113239404A (en) Federal learning method based on differential privacy and chaotic encryption
CN111428887B (en) Model training control method, device and system based on multiple computing nodes
CN109426732A (en) A kind of data processing method and device
CN109687952A (en) Data processing method and its device, electronic device and storage medium
CN113505882A (en) Data processing method based on federal neural network model, related equipment and medium
CN112417485B (en) Model training method, system and device based on trusted execution environment
WO2022174787A1 (en) Model training
US20180336370A1 (en) System and method for prediction preserving data obfuscation
CN110335140A (en) Method, apparatus, electronic equipment based on the black intermediary of social networks prediction loan
CN108140335A (en) Secret random number synthesizer, secret random number synthetic method and program
CN107431620A (en) Instantiated during the operation of broadcast encryption scheme
Gabr Quadratic and nonlinear programming problems solving and analysis in fully fuzzy environment
US11366893B1 (en) Systems and methods for secure processing of data streams having differing security level classifications
Veledar et al. Steering drivers of change: maximising benefits of trustworthy IoT
Tremori et al. A verification, validation and accreditation process for autonomous interoperable systems
Schöberl et al. Variational principles for different representations of Lagrangian and Hamiltonian systems
Gomes et al. Adaptive PORT-MVRB estimation of the extreme value index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20191211

Address after: P.O. Box 31119, grand exhibition hall, hibiscus street, 802 West Bay Road, Grand Cayman, Cayman Islands

Applicant after: Innovative advanced technology Co., Ltd

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Co., Ltd.

TA01 Transfer of patent application right
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40004192

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant