Summary of the invention
This specification proposes a kind of data ciphering method, which comprises
N* is generated based on the data characteristics of N data sample and M dimension for corresponding respectively to the N data sample
The objective matrix of M dimension;
Objective matrix based on N*M dimension is trained preset machine learning model, and the target for obtaining M*K dimension is thrown
Shadow matrix;Wherein, the K value is less than the M value;
The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains encrypted N*K dimension
Scrambled matrix;Wherein, the scrambled matrix is used for training machine learning model
Optionally, the preset machine learning model is to have the machine learning model of supervision;Wherein, the preset machine
In the loss function of device learning model to training pattern parameter, be expressed as multiplying for the target projection matrix and object vector
Product.
Optionally, the preset machine learning model is Logic Regression Models.
Optionally, the expression formula of the loss function is any in following formula:
Wherein, H indicates the projection matrix;U indicates the object vector;|| ||FIndicate the canonical of the loss function
The regularization norm of item;λ and Υ indicates regular coefficient;biIt indicates to be tieed up by M of the i-th data sample in the objective matrix
The feature vector that the data characteristics of degree generates;yiIndicate label corresponding with described eigenvector.
Optionally, the loss function is logarithm loss function;Wherein,
loss(biHu,yi)=log (1+exp (- yibiHu))。
This specification also proposes a kind of data encryption device, and described device includes:
Generation module, the data based on N data sample and M dimension for corresponding respectively to the N data sample
Feature generates the objective matrix of N*M dimension;
First training module, the objective matrix based on N*M dimension are trained preset machine learning model, obtain
The target projection matrix of M*K dimension;Wherein, the K value is less than the M value;
Encrypting module, the target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension, is encrypted
The scrambled matrix of N*K dimension afterwards;Wherein, the scrambled matrix is used for training machine learning model.
Optionally, the preset machine learning model is to have the machine learning model of supervision;Wherein, the preset machine
In the loss function of device learning model to training pattern parameter, be expressed as the product of target projection matrix and object vector.
Optionally, wherein the machine learning model is Logic Regression Models.
Optionally, the expression formula of the loss function is any in following formula:
Wherein, H indicates the projection matrix;U indicates the object vector;|| ||FIndicate the canonical of the loss function
The regularization norm of item;λ and Υ indicates regular coefficient;biIndicate the M dimension by the i-th data sample in the objective matrix
Data characteristics generate feature vector;yiIndicate label corresponding with described eigenvector.
Optionally, the loss function is logarithm loss function;Wherein,
loss(biHu,yi)=log (1+exp (- yibiHu))。
This specification also proposes a kind of machine learning model training method, which comprises
Receive the scrambled matrix of data providing server-side transmission;Wherein, the scrambled matrix is the data providing
Server-side is trained preset machine learning model based on the objective matrix that N*M is tieed up, the target projection square of obtained M*K dimension
Battle array;The K value is less than the M value;
Using the scrambled matrix as training sample training machine learning model.
It is optionally, described using the scrambled matrix as training sample training machine learning model, comprising:
It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction
Practice sample training machine learning model.
This specification also proposes a kind of machine learning model training device, and described device includes:
Receiving module receives the scrambled matrix of data providing server-side transmission;Wherein, the scrambled matrix is the number
Preset machine learning model is trained based on the objective matrix that N*M is tieed up according to provider's server-side, the mesh of obtained M*K dimension
Mark projection matrix;The K value is less than the M value;
Second training module, using the scrambled matrix as training sample training machine learning model.
Optionally, second training module further,
It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction
Practice sample training machine learning model.
This specification also proposes a kind of machine learning model training system, the system comprises:
Data providing server-side based on N data sample and corresponds respectively to M of N data sample dimension
The data characteristics of degree generates the objective matrix of N*M dimension;Based on the N*M dimension objective matrix to preset machine learning model into
Row training, obtains the target projection matrix of M*K dimension;Wherein, the K value is less than the M value;The objective matrix that the N*M is tieed up
With the target projection matrix multiple of M*K dimension, the scrambled matrix of encrypted N*K dimension is obtained;
Modeling service end is based on the scrambled matrix training machine learning model.
Optionally, the modeling service end further,
It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction
Practice sample training machine learning model.
This specification also proposes a kind of electronic equipment, comprising:
Processor;
For storing the memory of machine-executable instruction;
Wherein, it can be held by reading and executing the machine corresponding with the control logic of data encryption of the memory storage
Row instruction, the processor are prompted to:
N* is generated based on the data characteristics of N data sample and M dimension for corresponding respectively to the N data sample
The objective matrix of M dimension;
Objective matrix based on N*M dimension is trained preset machine learning model, and the target for obtaining M*K dimension is thrown
Shadow matrix;Wherein, the K value is less than the M value;
The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains encrypted N*K dimension
Scrambled matrix;Wherein, the scrambled matrix is used for training machine learning model;
This specification also proposes a kind of electronic equipment, comprising:
Processor;
For storing the memory of machine-executable instruction;
Wherein, by reading and executing the corresponding with the control logic of machine learning model training of the memory storage
Machine-executable instruction, the processor are prompted to:
Receive the scrambled matrix of data providing server-side transmission;Wherein, the scrambled matrix is the data providing
Server-side is trained preset machine learning model based on the objective matrix that N*M is tieed up, the target projection square of obtained M*K dimension
Battle array;The K value is less than the M value;
Using the scrambled matrix as training sample training machine learning model.
In this specification, data providing is by being based on N data sample and corresponding respectively to the N data sample
M dimension data characteristics generate N*M dimension objective matrix, preset machine learning model is trained, is trained
The target projection matrix of the N*K dimension of one dimensionality reduction, and the target projection square of the objective matrix that the N*M is tieed up and M*K dimension
Battle array is multiplied, and the scrambled matrix of encrypted N*K dimension is obtained, by the modeling service end using the scrambled matrix as training sample
Training machine learning model;
On the one hand, since modeling service end group is in the target projection matrix of the N*M objective matrix tieed up and M*K dimension
The scrambled matrix obtained after multiplication can not usually restore original objective matrix, it is thus possible to the greatest extent to user's
Private data carries out secret protection, avoids causing to use during data sample to be committed to modeling service end progress model training
The privacy leakage at family;
On the other hand, since above-mentioned scrambled matrix is that the linear dimensionality reduction of objective matrix progress tieed up for the N*M maps
The dimensionality reduction matrix of the N*K dimension arrived, therefore the transport overhead when transmitting data sample to modeling service end can be reduced;And
And since above-mentioned target projection matrix is by instructing using above-mentioned objective matrix as original sample training collection by machine learning
It gets out, therefore the encryption square that linear projection maps is carried out to the objective matrix based on the target projection matrix
Battle array, can utmostly retain the information content in primary data sample, thus the scrambled matrix after dimensionality reduction is transmitted to modeling
When server-side carries out model training, it still is able to guarantee the precision of model training.
Specific embodiment
In big data era, by excavating to mass data, various forms of useful informations, therefore number can be obtained
According to importance it is self-evident.Different mechanisms is owned by respective data, but the data mining effect of any mechanism,
It will be limited to the data bulk and type that its own possesses.For this problem, a kind of direct resolving ideas is: more mechanisms
It cooperates with each other, data is shared, to realize better data mining effect, realize win-win.
However for data owning side, data itself are a kind of up to much assets of tool, and for protection
Privacy prevents demands, the data owning sides such as leakage to be often reluctant to directly provide data, and this situation leads to " number
According to shared " it is difficult actual operation in reality.Therefore, how data sharing is realized under the premise of fully ensuring that data safety,
Have become the problem of being concerned in industry.
In the present specification, it is intended to propose a kind of original user data progress required to modeling by machine learning algorithm
To train the target projection matrix an of dimensionality reduction, and then the projection matrix can be used to original needed for modeling in machine learning
Beginning user data is encrypted, and carries out secret protection to original user data, and utmostly retain in original user data
Information content, so as under the premise of not sacrificing modeling accuracy, take into account the technical solution of the secret protection to user.
When realizing, the data characteristics of N number of dimension can be extracted respectively from the N data sample needed for modeling, and
Based on the N data sample, and the data characteristics of M dimension of the N data sample is corresponded respectively to, generates a N*M
The objective matrix of dimension is as sample training collection.
It, can be using the objective matrix that the N*M is tieed up as training sample to machine after generating the objective matrix of above-mentioned N*M dimension
Learning model is trained;For example, when above-mentioned machine learning model has the machine learning model of supervision, the machine learning model
Loss function in model parameter to be trained (vector of M dimension), can be expressed as by the linear transformation on basis
The product of the object vector of the target projection matrix and K*1 dimension of the N*K dimension of one dimensionality reduction;In turn, it is tieed up by above-mentioned N*M
Objective matrix above-mentioned machine learning model is trained after, can train a dimensionality reduction N*K dimension target projection square
Battle array.
After obtaining the target projection matrix, the objective matrix that above-mentioned N*M can be tieed up based on the target projection matrix into
The mapping of line dimensionality reduction is added by the target projection matrix multiple of the objective matrix for tieing up above-mentioned N*M and above-mentioned M*K dimension
The scrambled matrix of N*K dimension after close, and the scrambled matrix is transmitted to modeling service end.
It above-mentioned modeling service end, can be using the scrambled matrix as training sample training machine after receiving the scrambled matrix
Learning model;For example, above-mentioned scrambled matrix can be merged with its local training sample, it is then based on fused instruction
Practice sample and carrys out training machine learning model.
On the one hand, since modeling service end group is in the target projection matrix of the N*M objective matrix tieed up and M*K dimension
The scrambled matrix obtained after multiplication can not usually restore original objective matrix, it is thus possible to the greatest extent to user's
Private data carries out secret protection, avoids causing to use during data sample to be committed to modeling service end progress model training
The privacy leakage at family;
On the other hand, since above-mentioned scrambled matrix is that the linear dimensionality reduction of objective matrix progress tieed up for the N*M maps
The dimensionality reduction matrix of the N*K dimension arrived, therefore the transport overhead when transmitting data sample to modeling service end can be reduced;And
And since above-mentioned target projection matrix is by instructing using above-mentioned objective matrix as original sample training collection by machine learning
It gets out, therefore the encryption square that linear projection maps is carried out to the objective matrix based on the target projection matrix
Battle array, can utmostly retain the information content in primary data sample, thus the scrambled matrix after dimensionality reduction is transmitted to modeling
When server-side carries out model training, it still is able to guarantee the precision of model training.
It is described in detail below by specific embodiment and in conjunction with specific application scenarios.
Referring to FIG. 1, Fig. 1 is a kind of data ciphering method that one embodiment of this specification provides, it is applied to data and provides
Square server-side executes following steps:
Step 102, based on the data of N data sample and M dimension for corresponding respectively to N data sample spy
Sign generates the objective matrix of N*M dimension;
Step 104, the objective matrix based on N*M dimension is trained preset machine learning model, obtains M*K dimension
Target projection matrix;Wherein, the K value is less than the M value;
Step 106, the target projection matrix multiple of the objective matrix N*M tieed up and M*K dimension, after obtaining encryption
N*K dimension scrambled matrix;Wherein, the scrambled matrix is used for training machine learning model.
Above-mentioned data providing server-side can be docked with modeling service end, and Xiang Shangshu modeling service end provides and builds
Data sample needed for mould;
For example, in practical applications, above-mentioned data providing and modeling side can respectively correspond different operators, data
Provider can be transmitted to above-mentioned modeling side using collected user data as data sample to complete data modeling;For example,
Above-mentioned modeling side can be the data operation platform of Al ipay, and above-mentioned data providing can be and transport with the data of Al ipay
It seeks the user orienteds such as third party bank, the express company of platform docking and the service platform of Internet service is provided.Initial
Under state, the server-side of data providing can collect the user data of the daily generation of user on backstage, and from be collected into this
N user data of acquisition is generated at the beginning of one as data sample, and based on these collected data samples in a little user data
The data sample set of beginningization.
For example, can acquire out N item from these user data being collected into a kind of embodiment shown and relate to
And the sensitive data of privacy of user, it is then based on the data sample set that these sensitive datas generate an initialization.
Wherein, the particular number of collected above-mentioned N data sample, in the present specification without being particularly limited to, this
Field technical staff can be configured based on actual demand.
The specific form of above-mentioned user data depends on specific business scenario and modeling requirement, in the present specification
Also without being particularly limited to;For example, in practical applications, if it is desired to be created that one is used for Client-initiated payment transaction
The scorecard model of risk assessment is carried out, then above-mentioned user data then can be logical based on user under this business scenario
Cross the transaction data of payment client terminal generation.
After generating above-mentioned data sample set based on collected N data sample, above-mentioned data providing server-side
Data sample in the data sample set can also be pre-processed.
Wherein, the data sample in above-mentioned data sample set is pre-processed, is generally included to above-mentioned data sample
The preprocessing process that data sample in set carries out data cleansing, supplements default value, normalized or other forms.It is logical
It crosses and the data sample in data sample set is pre-processed, collected data sample can be converted into being suitable for carrying out mould
The standardized data sample of type training.
After the completion of to the data sample pretreatment in above-mentioned data sample set, above-mentioned data providing server-side can be with
From each data sample in above-mentioned data sample set, the data characteristics of M dimension is extracted respectively;
Wherein, the quantity of the data characteristics of the above-mentioned M dimension extracted, in the present specification without being particularly limited to,
Those skilled in the art can be selected based on actual modeling requirement.
In addition, the concrete type of the data characteristics extracted, in the present specification also without being particularly limited to, this field skill
Art personnel can be based on actual modeling requirement, manually select from the information for being included practical in above-mentioned data sample;
For example, in one embodiment, above-mentioned modeling side can be based on actual modeling requirement, M dimension is preselected
Then selected data characteristics is supplied to above-mentioned data providing by the data characteristics of degree, by above-mentioned data providing from above-mentioned
Data characteristics value corresponding with the data characteristics of each dimension is extracted in data sample.
When data providing is extracted with each data sample in above-mentioned data sample set, M dimension is extracted respectively
After the data characteristics of degree, can the corresponding data characteristics value of data characteristics based on the M dimension extracted, be each data sample
This generates a data feature vector respectively, is then based on the data characteristics vector of each data sample, constructs a N*M dimension
Objective matrix.
Wherein, when realizing, the data characteristics of above-mentioned M dimension can correspond to the row of above-mentioned objective matrix, can also correspond to
In the column of above-mentioned objective matrix, in the present specification without being particularly limited to.
For example, referring to Fig. 2, the behavior example of above-mentioned objective matrix, above-mentioned target are corresponded to the data characteristics of above-mentioned M dimension
Matrix can be expressed as form as shown in Figure 2.In objective matrix shown in Fig. 2, each column indicate a data sample, often
A line indicates a feature vector being made of the data characteristics of M dimension.
When the server-side of above-mentioned data providing is based on N data sample, and the N data sample is corresponded respectively to
The data characteristics of M dimension can adjust the objective matrix for tieing up above-mentioned N*M as original after generating the objective matrix that N*M is tieed up
Sample training collection carries out machine learning model training, trains one for being encrypted to the above-mentioned N*M objective matrix tieed up
Projection matrix.
In the present specification, above-mentioned machine learning model specifically can be the machine learning model of supervision;For example, above-mentioned
Machine learning model specifically can be LR (Logi st ic Regress ion, logistic regression) model.
Wherein, the concrete type of above-mentioned machine learning model, in the present specification without being particularly limited to;For example, in reality
In the application of border, above-mentioned machine learning model specifically can be is built based on the machine learning algorithm (such as regression algorithm) for having supervision
The prediction model for having supervision;For example, the payment transaction data based on user, what is trained is used to assess the transaction risk of user
Scorecard model.
In practical applications, there is the machine learning model of supervision, usually using loss function (Loss Funct ion), come
Error of fitting between training sample and corresponding sample label is described.During actual model training, damage can use
It loses function to be trained the above-mentioned N*M objective matrix tieed up, using training sample and corresponding sample label as input value, come anti-
To the value of model parameter when solving the error of fitting minimum between training sample and corresponding sample label, and will solve
The value of the model parameter out is as optimized parameter, to construct above-mentioned Logic Regression Models.Trained by loss function
Logic Regression Models, it can be understood as reversely solved by loss function, between training sample and corresponding sample label
Error of fitting minimum when model parameter value process.
Based on this, in the present specification, machine learning model is trained in the objective matrix tieed up based on above-mentioned N*M
To above-mentioned M*K tie up target projection matrix when, specifically can be by be trained in the loss function by the machine learning model
Model parameter ω is expressed as the product of above-mentioned target projection matrix H and a specific coefficient u, is then updated to above-mentioned loss
In function, above-mentioned target projection matrix is solved by model training.I.e. in practical applications, can by above-mentioned wait train
Model parameter ω carry out basis mathematic(al) manipulation, from derived from model parameter ω to be trained go out a target projection matrix.
For example, in a kind of embodiment shown, it is above-mentioned wait train when training sample set is the objective matrix of N*M
Model parameter ω be usually a M dimension vector (one be made of weighted value corresponding with the data characteristics that M is tieed up
Vector);In such a case, it is possible to, by model parameter ω to be trained, indicate to become a M* by the linear transformation on basis
The product of the object vector u of the target projection matrix H and K*1 dimension of K dimension (is by original i.e. for above-mentioned H and u
Model parameter ω to be trained, the parameter being derived by the linear change on some bases), then it is updated to above-mentioned loss letter
In several expression, and above-mentioned target projection matrix H is solved by model training process.
It is described in detail so that the above-mentioned machine learning model for having supervision is Logic Regression Models as an example below.
Wherein, it should be noted that only show so that the above-mentioned machine learning model for having supervision is Logic Regression Models as an example
Example property;Obviously, in practical applications, above-mentioned machine learning model is specifically also possible to other classes other than Logic Regression Models
The machine learning model of type.
In the present specification, above-mentioned N data sample can carry the sample label demarcated in advance respectively.Wherein,
The specific form of sample label is also generally dependent on specific business scenario and modeling requirement, in the present specification also not into
Row is particularly limited to;
For example, in practical applications, if it is desired to be created that one is used to carry out risk to Client-initiated payment transaction
The scorecard model of assessment, then above-mentioned user data, which can be, passes through payment client based on user under this business scenario
The transaction data generated is held, and above-mentioned sample label then specifically can be one and be used to indicate transaction data sample with the presence or absence of friendship
The label of easy risk.
In this case, it is based on above-mentioned N data sample, and corresponds respectively to M dimension of above-mentioned N data sample
Each of the objective matrix of above-mentioned N*M dimension of data characteristics building of degree data characteristics vector, can respectively correspond one
Sample label.
In a kind of embodiment shown, the formula of the loss function of Logic Regression Models can be as follows:
Wherein, in above formula, ω indicates model parameter to be trained, usually the vector of M dimension;biIt indicates by institute
State the feature vector that the data characteristics of M dimension of the i-th data sample in objective matrix generates;The target of i.e. above-mentioned N*M dimension
The feature that the data characteristics of the M dimension by the i-th data sample in matrix, and corresponding to the i-th data sample generates
Vector (for the row expression data characteristics of objective matrix as described above, above-mentioned biIndicate the i-th row feature in above-mentioned objective matrix
Vector);yiExpression and biCorresponding sample label.Above formula expression solves and biWith yiBetween error of fitting minimum
It is worth corresponding optimized parameter ω.
Wherein, in practical applications, when using loss function to train Logic Regression Models, in order to avoid training
The problem of model over-fitting out, it will usually introduce a norm regular terms for above-mentioned loss function.
For example, the formula of above-mentioned loss function can be as follows in the another embodiment shown:
Wherein, in above formula,The norm regular terms as introduced;λ indicates regular coefficient, usually originally
The real number that field technical staff is arranged based on actual demand;|| ||FIndicate norm type used by above-mentioned regular terms;
In practical applications, norm type used by above-mentioned regular terms can be carried out by those skilled in the art based on actual demand
Setting.
In the present specification, the model parameter ω to be learned in above-mentioned formula can be passed through basis by data providing
Linear transformation indicate to become product (the i.e. matrix of M*K dimension and one of a target projection matrix H and an object vector u
The object vector of K*1 dimension is multiplied the vector of available M*1 dimension), it is then updated in above-mentioned objective function.
Wherein, it should be noted that above-mentioned target projection matrix specifically can be the dimensionality reduction matrix of M*K dimension.Above-mentioned K
Value by be less than above-mentioned M value, the value of the K specifically can by those skilled in the art be based on actual demand set
It sets;For example, in one implementation, the value of above-mentioned K can specifically take the half or one third of M value.At this
In the case of kind, the model parameter ω to be trained of above-mentioned M dimension can be expressed as a M*K dimension based on the linear transformation on basis
Target projection square H and a K*1 dimension object vector u product.
When the linear transformation that model parameter ω to be learned is passed through to basis indicates to become a target projection matrix H and one
After the product of a object vector u, H and u can be updated in above formula, the objective matrix for then tieing up above-mentioned N*M as
Training sample, and the existing detailed process based on loss function training Logic Regression Models is combined to carry out model training, reversely
Above-mentioned target projection matrix H is solved, for encrypting to initial data.
In the present specification, in the formula of above-mentioned loss function, regular terms can be carried, canonical can not also be carried.
In one embodiment, the formula for not carrying the loss function of regular terms can be as follows shown in formula 1:
H indicates above-mentioned target projection matrix in above formula;U indicates above-mentioned object vector;The meaning of other each parameters
It repeats no more.
In another embodiment, the formula for carrying the loss function of regular terms can be as follows shown in formula 2:
In above formula, λ indicates the common regularization coefficient of H and u;|| ||FIndicate model used by above-mentioned regular terms
Several classes of types, the value of the M specifically can combine actual demand to be configured by those skilled in the art;For example, in a kind of feelings
Under condition, the value of subscript F can be 2, i.e., the regular terms of above-mentioned loss function can use 2 norms (i.e. eucl id norm).
Certainly, in practical applications, above formula can also be deformed further, above-mentioned target projection matrix H and
Above-mentioned object vector u can respectively correspond different regular terms, and set different regularization coefficients.
In this case, the formula for carrying the loss function of regular terms can be as follows shown in formula 3:
Wherein, in above formula, the corresponding value of λ and Υ is different;Also, it is corresponding with above-mentioned target projection matrix H
Norm type used by norm type used by regular terms, and regular terms corresponding from above-mentioned object vector can be different
(value of subscript F can be different in regular terms i.e. corresponding from H and u).
For example, in one implementation, norm type used by regular terms corresponding with above-mentioned target projection matrix H
It can be f norm (frobenius norm norm) that norm type used by regular terms corresponding with above-mentioned object vector can be with
For 2 norms.
In this case, above-mentioned formula can be expressed as:
Wherein, it should be noted that in practical applications, the concrete type of above-mentioned loss function, in the present specification not
It is particularly limited;
For example, above-mentioned loss function can specifically use logarithm loss function in a kind of embodiment shown.At this
In the case of kind, in formula 1-3 illustrated above, above-mentioned loss (biHu,yi) form that can specifically be expressed as:
loss(biHu,yi)=log (1+exp (- yibiHu))。
Certainly, in practical applications, other than logarithm loss function, other types of loss function can also be used
(such as figure penalties function), in the present specification without being particularly limited to, those skilled in the art can be based on actual need
It asks and is selected.
In the present specification, when data providing is based on N data sample and corresponds respectively to the N data sample
M dimension data characteristics generate the objective matrix of above-mentioned N*M dimension after, appointing in formula 1-3 illustrated above can be called
A formula anticipate to train Logic Regression Models, solves expression and solves and biWith yiBetween error of fitting minimum value pair
The optimized parameter H answered.
Wherein, when calling formula illustrated above to train Logic Regression Models, logistic regression mould can specifically be used
The traditional gradient descent method of type solves optimized parameter H, specific training process, in this theory to complete the training process of model
It is no longer described in detail in bright book, those skilled in the art can refer to when the technical solution in this specification is put into effect
Record in the related technology.
In the present specification, after finally training above-mentioned target projection matrix H by model training, the throwing of M*K dimension
Shadow matrix, as eventually for the matrix encrypted to original objective matrix.It is right in the projection matrix tieed up based on the M*K
When the objective matrix of original N*M is encrypted, specifically can based on the M*K tie up dimensionality reduction after target projection matrix, to original
The objective matrix of the high-dimensional N*M dimension to begin carries out linear projection mapping, and the objective matrix of the N*M is mapped to M*K dimension
In the projection matrix space of low dimensional.
When realizing, the objective matrix of above-mentioned original N*M is mapped to the projection matrix space of above-mentioned M*K dimension, specifically
It can be realized by the way that the objective matrix of above-mentioned original N*M is multiplied with the above-mentioned M*K projection matrix tieed up;Wherein, above-mentioned original
The objective matrix of the N*M of beginning is multiplied with the above-mentioned M*K projection matrix tieed up, and can be multiplied using the right side or using premultiplication;
Such as, it is assumed that it, can be by by above-mentioned original when column of the data characteristics of above-mentioned M dimension as above-mentioned objective matrix
The objective matrix of the N*M of beginning and the projection matrix of above-mentioned M*K dimension carry out the right side and multiply calculating, by the objective matrix of above-mentioned original N*M
It is mapped to the projection matrix space of above-mentioned M*K dimension;It so, can also be by by the target square of above-mentioned original N*M when realizing
The projection matrix of battle array and above-mentioned M*K dimension carries out premultiplication, then transposition is carried out to premultiplication result, by the target of above-mentioned original N*M
Matrix is mapped to the projection matrix space of above-mentioned M*K dimension.
After the objective matrix of above-mentioned original N*M to be mapped to the projection matrix space of above-mentioned M*K dimension, it can obtain at this time
The scrambled matrix tieed up to a N*K.At this point, mapping matrix progress of the scrambled matrix as by above-mentioned M*K dimension is encrypted
Data sample.
Wherein, in a kind of embodiment shown, above-mentioned data providing server-side is passing through calculating illustrated above
Process after calculating the projection matrix that above-mentioned M*K is tieed up, can also be deposited using the projection matrix as scrambled matrix locally
Storage.
Subsequent above-mentioned data providing server-side is collected into newest N data data sample again, and is based on the N item
Data sample and correspond respectively to the N data sample M dimension data characteristics generate N*M dimension objective matrix
Afterwards, it can be determined that local whether to store above-mentioned projection matrix;
If above-mentioned projection matrix has been locally stored, stored above-mentioned projection matrix can be directly used, to above-mentioned
The matrix of N*M is encrypted, and specific ciphering process repeats no more.
It certainly, can above-mentioned target projection square as described above if local not stored above-mentioned projection matrix
The training learning process of battle array H, relearns out the target projection matrix H.
In addition, it is necessary to explanation, in practical applications, if the dimension of the data characteristics of above-mentioned M dimension becomes
Change (for example increasing the data characteristics of new dimension, perhaps deleted the data characteristics of part of dimension) or above-mentioned M
The meaning that the data characteristics of all or part of dimension in the data characteristics of dimension is characterized changes, then above-mentioned at this time
Data providing can relearn out the target based on the training learning process of above-mentioned target projection matrix H described above
Projection matrix H, and the local projection matrix that should be stored is updated using the projection matrix recalculated.
It in this way, can be when the data characteristics needed for modeling updates, timely to being locally stored
The scrambled matrix of failure is updated, so as to avoid adding original objective matrix using failed scrambled matrix
It is close, and caused by data information amount loss influence modeling accuracy.
In the present specification, when the encryption square for according to training learning process illustrated above, having obtained encrypted N*K dimension
After battle array, above-mentioned data providing server-side can be transmitted to and above-mentioned data providing using the scrambled matrix as training sample
The modeling service end of docking.
And modeling service end is after the above-mentioned scrambled matrix for receiving the transmission of above-mentioned data providing server-side, modeling service end
It can be using the scrambled matrix as training sample training machine learning model;
Wherein, in a kind of embodiment shown, above-mentioned modeling service end specifically can be by above-mentioned scrambled matrix, with this
The training sample of ground storage is merged, and is then based on fused training sample, is carried out joint training machine learning model.
Fig. 3 is referred to, Fig. 3 carries out the signal of joint modeling for a kind of fusion multiparty data sample shown in this specification
Figure.
In a scenario, above-mentioned modeling side can be the data operation platform of Al ipay, and above-mentioned data providing can
To include that the user orienteds such as the bank docked with the data operation platform of Al ipay, third party financial institution provide interconnection
Net the service platform of service.
It in practical applications, is one non-since the data operation platform of Al ipay is for above-mentioned data providing
The third party of credit, therefore above-mentioned data providing directly transports the data that local customer transaction data are supplied to Al ipay
Seek the problem of platform carries out data modeling, privacy of user may be caused to reveal in data transmission link.
In this case, each data providing can be based on the public affairs of any one in formula 1- formula 3 illustrated above
Formula is trained study by the objective matrix to the N*M dimension based on original transaction data sample generation, obtains a M*K
The projection matrix of dimension, the projection matrix then tieed up using the M*K are carried out dimensionality reduction encryption to the objective matrix of above-mentioned N*M dimension, obtained
The scrambled matrix of one N*K dimension, and it is transferred to using the scrambled matrix as training sample the data operation platform of Al ipay.
And the training sample provided by each data providing that the data operation platform of Al ipay can will receive,
It is merged with the data sample of localization, is then based on fused training sample and carrys out training machine learning model;For example, base
In the customer transaction data that bank and third party financial institution provide, and localized in the data operation platform of Al ipay
Customer transaction data are merged, and joint training goes out the scorecard model that risk assessment is carried out for the transaction to user.
It is subsequent above-mentioned after training above-mentioned machine learning model when above-mentioned modeling side is based on modeling pattern illustrated above
Data providing still can be based on above-mentioned target projection matrix H, to special based on collected data sample and related data
The data matrix of sign building carries out dimensionality reduction encryption, is then transmit to above-mentioned machine learning model and is calculated, obtains the defeated of model
Result out;For example, what is trained is used to assess user's with above-mentioned machine learning model for the payment transaction data based on user
For the scorecard model of transaction risk, above-mentioned data providing can be based on above-mentioned projection matrix, to based on collected use
The data matrix of the transaction data building at family carries out dimensionality reduction encryption, is then transmitted to above-mentioned scorecard model as input data,
Obtain risk score corresponding with every transaction.
A kind of data ciphering method provided above for this specification embodiment, refers to Fig. 4, is based on same thinking,
This specification embodiment also provides a kind of machine learning model training method, is applied to modeling service end, executes following steps:
Step 402, the scrambled matrix of data providing server-side transmission is received;Wherein, the scrambled matrix is the number
Preset machine learning model is trained based on the objective matrix that N*M is tieed up according to provider's server-side, the mesh of obtained M*K dimension
Mark projection matrix;The K value is less than the M value;
Step 404, using the scrambled matrix as training sample training machine learning model.
Wherein, the corresponding implementation process of technical characteristic in each step shown in Fig. 4, it is no longer superfluous in the present embodiment
It states, is referred to the record of above embodiments.
As can be seen from the above embodiments data providing is by being based on N data sample and corresponding respectively to the N
The objective matrix for the N*M dimension that the data characteristics of M dimension of data sample generates, instructs preset machine learning model
Practice, trains the target projection matrix of the N*K dimension an of dimensionality reduction, and the mesh of the objective matrix that the N*M is tieed up and M*K dimension
Mark projection matrix to be multiplied, obtain the scrambled matrix of encrypted N*K dimension, by the modeling service end using the scrambled matrix as
Training sample training machine learning model;
On the one hand, since modeling service end group is in the target projection matrix of the N*M objective matrix tieed up and M*K dimension
The scrambled matrix obtained after multiplication can not usually restore original objective matrix, it is thus possible to the greatest extent to user's
Private data carries out secret protection, avoids causing to use during data sample to be committed to modeling service end progress model training
The privacy leakage at family;
On the other hand, since above-mentioned scrambled matrix is that the linear dimensionality reduction of objective matrix progress tieed up for the N*M maps
The dimensionality reduction matrix of the N*K dimension arrived, therefore the transport overhead when transmitting data sample to modeling service end can be reduced;
Moreover, because above-mentioned target projection matrix is by passing through above-mentioned objective matrix as original sample training collection
Machine learning training obtains, therefore carries out what linear projection mapped to the objective matrix based on the target projection matrix
The scrambled matrix can utmostly retain the information content in primary data sample, thus by the scrambled matrix after dimensionality reduction
When being transmitted to modeling service end progress model training, it still is able to guarantee the precision of model training.
Corresponding with above method embodiment, this specification additionally provides a kind of embodiment of data encryption device.This theory
The embodiment of the data encryption device of bright book can be using on an electronic device.Installation practice can by software realization,
It can be realized by way of hardware or software and hardware combining.Taking software implementation as an example, as the device on a logical meaning,
It is that computer program instructions corresponding in nonvolatile memory are read by memory by the processor of electronic equipment where it
What middle operation was formed.For hardware view, as shown in figure 5, for one of electronic equipment where the data encryption device of the application
Kind hardware structure diagram, other than processor shown in fig. 5, memory, network interface and nonvolatile memory, embodiment
Electronic equipment where middle device can also include other hardware, no longer to this generally according to the actual functional capability of the electronic equipment
It repeats.
Fig. 6 is a kind of block diagram of data encryption device shown in one exemplary embodiment of the application.
Referring to FIG. 6, the data encryption device 60 can be applied in aforementioned electronic equipment shown in fig. 5, include:
Generation module 601, training module 602, encrypting module 603.
Wherein, generation module 601 based on N data sample and correspond respectively to M of N data sample dimension
The data characteristics of degree generates the objective matrix of N*M dimension;
First training module 602, the objective matrix based on N*M dimension are trained preset machine learning model,
Obtain the target projection matrix of M*K dimension;Wherein, the K value is less than the M value;
Encrypting module 603, the target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension, is added
The scrambled matrix of N*K dimension after close;Wherein, the scrambled matrix is used for training machine learning model.
In the present embodiment, the preset machine learning model is to have the machine learning model of supervision;Wherein, described pre-
If machine learning model loss function in training pattern parameter, be expressed as target projection matrix and object vector
Product.
In the present embodiment, wherein the machine learning model is Logic Regression Models.
In the present embodiment, the expression formula of the loss function is any in following formula:
Wherein, H indicates the projection matrix;U indicates the object vector;|| ||FIndicate the canonical of the loss function
The regularization norm of item;λ and Υ indicates regular coefficient;biIndicate the M dimension by the i-th data sample in the objective matrix
Data characteristics generate feature vector;yiIndicate label corresponding with described eigenvector.
In the present embodiment, the loss function is logarithm loss function;Wherein,
loss(biHu,yi)=log (1+exp (- yibiHu))。
Corresponding with above method embodiment, this specification additionally provides a kind of implementation of machine learning model training device
Example.The embodiment of the machine learning model training device of this specification can be using on an electronic device.Installation practice can be with
By software realization, can also be realized by way of hardware or software and hardware combining.Taking software implementation as an example, it is patrolled as one
Device in volume meaning is by the processor of electronic equipment where it by computer program corresponding in nonvolatile memory
Instruction is read into memory what operation was formed.For hardware view, as shown in fig. 7, the machine learning model for the application is instructed
Practice device where electronic equipment a kind of hardware structure diagram, in addition to processor shown in Fig. 7, memory, network interface and it is non-easily
Except the property lost memory, electronic equipment in embodiment where device, can be with generally according to the actual functional capability of the electronic equipment
Including other hardware, this is repeated no more.
Fig. 8 is a kind of block diagram of machine learning model training device shown in one exemplary embodiment of the application.
Referring to FIG. 8, the machine learning model training device 80 can be applied in aforementioned electronic equipment shown in Fig. 7
In, include: receiving module 801 and the second training module 802.
Wherein, receiving module 801 receive the scrambled matrix of data providing server-side transmission;Wherein, the scrambled matrix
Preset machine learning model is trained based on the objective matrix that N*M is tieed up for the data providing server-side, is obtained
The target projection matrix of M*K dimension;The K value is less than the M value;
Second training module 802, using the scrambled matrix as training sample training machine learning model.
In the present embodiment, second training module 802 further,
It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction
Practice sample training machine learning model.
The function of modules and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus
Realization process, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize application scheme.Those of ordinary skill in the art are not paying
Out in the case where creative work, it can understand and implement.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can
To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment
The combination of any several equipment.
Corresponding with above method embodiment, this specification also provides a kind of implementation of machine learning model training system
Example.
The machine learning model training system may include data providing server-side and modeling service end.
Wherein, above-mentioned data providing server-side based on N data sample and corresponds respectively to the N data sample
The data characteristics of this M dimension generates the objective matrix of N*M dimension;Objective matrix based on N*M dimension is to preset machine
Learning model is trained, and obtains the target projection matrix of M*K dimension;Wherein, the K value is less than the M value;The N*M is tieed up
Objective matrix and M*K dimension target projection matrix multiple, obtain the scrambled matrix of encrypted N*K dimension;
Above-mentioned modeling service end is based on the scrambled matrix training machine learning model.
In the present embodiment, the modeling service end further,
It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction
Practice sample training machine learning model.
Corresponding with above method embodiment, present invention also provides the embodiments of a kind of electronic equipment.The electronic equipment
It include: processor and the memory for storing machine-executable instruction;Wherein, processor and memory usually pass through inside
Bus is connected with each other.In other possible implementations, the equipment is also possible that external interface, can set with other
Standby or component is communicated.
In the present embodiment, the control with the data encryption as shown in 1 stored by reading and executing the memory
The corresponding machine-executable instruction of logic processed, the processor are prompted to:
N* is generated based on the data characteristics of N data sample and M dimension for corresponding respectively to the N data sample
The objective matrix of M dimension;
Objective matrix based on N*M dimension is trained preset machine learning model, and the target for obtaining M*K dimension is thrown
Shadow matrix;Wherein, the K value is less than the M value;
The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains encrypted N*K dimension
Scrambled matrix;Wherein, the scrambled matrix is used for training machine learning model;
In the present embodiment, the preset machine learning model is to have the machine learning model of supervision;Wherein, described pre-
If machine learning model loss function in training pattern parameter, be expressed as the target projection matrix and target to
The product of amount.
In the present embodiment, the preset machine learning model is Logic Regression Models.
In the present embodiment, it is patrolled by reading and executing the control of the recovery with database corruption of the memory storage
Corresponding machine-executable instruction is collected, the processor is prompted to:
Model training is carried out to the objective matrix that the N*M is tieed up based on any one loss function in following formula:
Wherein, H indicates the projection matrix;U indicates the object vector;|| ||FIndicate the canonical of the loss function
The regularization norm of item;λ and Υ indicates regular coefficient;biIndicate the M dimension by the i-th data sample in the objective matrix
Data characteristics generate feature vector;yiIndicate label corresponding with described eigenvector.
In the present embodiment, the loss function is logarithm loss function;Wherein,
loss(biHu,yi)=log (1+exp (- yibiHu))。
Corresponding with above method embodiment, present invention also provides the embodiments of a kind of electronic equipment.The electronic equipment
It include: processor and the memory for storing machine-executable instruction;Wherein, processor and memory usually pass through inside
Bus is connected with each other.In other possible implementations, the equipment is also possible that external interface, can set with other
Standby or component is communicated.
In the present embodiment, by reading and executing memory storage and the machine learning model as shown in 4
The corresponding machine-executable instruction of trained control logic, the processor are prompted to:
Receive the scrambled matrix of data providing server-side transmission;Wherein, the scrambled matrix is the data providing
Server-side is trained preset machine learning model based on the objective matrix that N*M is tieed up, the target projection square of obtained M*K dimension
Battle array;The K value is less than the M value;
Using the scrambled matrix as training sample training machine learning model.
In the present embodiment, by reading and executing memory storage and the machine learning model as shown in 4
The corresponding machine-executable instruction of trained control logic, the processor are also prompted to:
It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction
Practice sample training machine learning model.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or
Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following
Claim is pointed out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.