CN109426861A

CN109426861A - Data encryption, machine learning model training method, device and electronic equipment

Info

Publication number: CN109426861A
Application number: CN201710703413.XA
Authority: CN
Inventors: 赵沛霖
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd
Priority date: 2017-08-16
Filing date: 2017-08-16
Publication date: 2019-03-05

Abstract

A kind of data ciphering method is disclosed, comprising: the objective matrix of N*M dimension is generated based on the data characteristics of N data sample and M dimension for corresponding respectively to the N data sample；Objective matrix based on N*M dimension is trained preset machine learning model, obtains the target projection matrix of M*K dimension；Wherein, the K value is less than the M value；The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains the scrambled matrix of encrypted N*K dimension；Wherein, the scrambled matrix is used for training machine learning model.

Description

Data encryption, machine learning model training method, device and electronic equipment

Technical field

This specification be related to computer application field more particularly to a kind of data encryption, machine learning model training method, Device and electronic equipment.

Background technique

With the rapid development of Internet technology, the networking of the personal data of user and transparence have become and can not hinder The main trend of gear.It, can be daily by acquisition user for some user orienteds provide the service platform of Internet service The service data of generation is collected into the user data of magnanimity.And these user data are for the operator of service platform The operator of " resource " being of great rarity, service platform can be dug by data mining and machine learning from these " resources " Excavate a large amount of valuable information.For example, in practical applications, it can be in conjunction with specific business scenario, from these mass users numbers The data characteristics of several dimensions is extracted in, and using these features extracted as training sample, passes through specific machine Device learning algorithm is trained building machine learning model, the machine learning that then application training is completed under the business scenario Model carrys out service guidance operation.

Summary of the invention

This specification proposes a kind of data ciphering method, which comprises

N* is generated based on the data characteristics of N data sample and M dimension for corresponding respectively to the N data sample The objective matrix of M dimension；

Objective matrix based on N*M dimension is trained preset machine learning model, and the target for obtaining M*K dimension is thrown Shadow matrix；Wherein, the K value is less than the M value；

The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains encrypted N*K dimension Scrambled matrix；Wherein, the scrambled matrix is used for training machine learning model

Optionally, the preset machine learning model is to have the machine learning model of supervision；Wherein, the preset machine In the loss function of device learning model to training pattern parameter, be expressed as multiplying for the target projection matrix and object vector Product.

Optionally, the preset machine learning model is Logic Regression Models.

Optionally, the expression formula of the loss function is any in following formula:

Wherein, H indicates the projection matrix；U indicates the object vector；|| ||_FIndicate the canonical of the loss function The regularization norm of item；λ and Υ indicates regular coefficient；b_iIt indicates to be tieed up by M of the i-th data sample in the objective matrix The feature vector that the data characteristics of degree generates；y_iIndicate label corresponding with described eigenvector.

Optionally, the loss function is logarithm loss function；Wherein,

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

This specification also proposes a kind of data encryption device, and described device includes:

Generation module, the data based on N data sample and M dimension for corresponding respectively to the N data sample Feature generates the objective matrix of N*M dimension；

First training module, the objective matrix based on N*M dimension are trained preset machine learning model, obtain The target projection matrix of M*K dimension；Wherein, the K value is less than the M value；

Encrypting module, the target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension, is encrypted The scrambled matrix of N*K dimension afterwards；Wherein, the scrambled matrix is used for training machine learning model.

Optionally, the preset machine learning model is to have the machine learning model of supervision；Wherein, the preset machine In the loss function of device learning model to training pattern parameter, be expressed as the product of target projection matrix and object vector.

Optionally, wherein the machine learning model is Logic Regression Models.

Wherein, H indicates the projection matrix；U indicates the object vector；|| ||_FIndicate the canonical of the loss function The regularization norm of item；λ and Υ indicates regular coefficient；b_iIndicate the M dimension by the i-th data sample in the objective matrix Data characteristics generate feature vector；y_iIndicate label corresponding with described eigenvector.

Optionally, the loss function is logarithm loss function；Wherein,

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

This specification also proposes a kind of machine learning model training method, which comprises

Receive the scrambled matrix of data providing server-side transmission；Wherein, the scrambled matrix is the data providing Server-side is trained preset machine learning model based on the objective matrix that N*M is tieed up, the target projection square of obtained M*K dimension Battle array；The K value is less than the M value；

Using the scrambled matrix as training sample training machine learning model.

It is optionally, described using the scrambled matrix as training sample training machine learning model, comprising:

It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused instruction Practice sample training machine learning model.

This specification also proposes a kind of machine learning model training device, and described device includes:

Receiving module receives the scrambled matrix of data providing server-side transmission；Wherein, the scrambled matrix is the number Preset machine learning model is trained based on the objective matrix that N*M is tieed up according to provider's server-side, the mesh of obtained M*K dimension Mark projection matrix；The K value is less than the M value；

Second training module, using the scrambled matrix as training sample training machine learning model.

Optionally, second training module further,

This specification also proposes a kind of machine learning model training system, the system comprises:

Data providing server-side based on N data sample and corresponds respectively to M of N data sample dimension The data characteristics of degree generates the objective matrix of N*M dimension；Based on the N*M dimension objective matrix to preset machine learning model into Row training, obtains the target projection matrix of M*K dimension；Wherein, the K value is less than the M value；The objective matrix that the N*M is tieed up With the target projection matrix multiple of M*K dimension, the scrambled matrix of encrypted N*K dimension is obtained；

Modeling service end is based on the scrambled matrix training machine learning model.

Optionally, the modeling service end further,

This specification also proposes a kind of electronic equipment, comprising:

Processor；

For storing the memory of machine-executable instruction；

Wherein, it can be held by reading and executing the machine corresponding with the control logic of data encryption of the memory storage Row instruction, the processor are prompted to:

The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains encrypted N*K dimension Scrambled matrix；Wherein, the scrambled matrix is used for training machine learning model；

This specification also proposes a kind of electronic equipment, comprising:

Processor；

For storing the memory of machine-executable instruction；

Wherein, by reading and executing the corresponding with the control logic of machine learning model training of the memory storage Machine-executable instruction, the processor are prompted to:

Using the scrambled matrix as training sample training machine learning model.

In this specification, data providing is by being based on N data sample and corresponding respectively to the N data sample M dimension data characteristics generate N*M dimension objective matrix, preset machine learning model is trained, is trained The target projection matrix of the N*K dimension of one dimensionality reduction, and the target projection square of the objective matrix that the N*M is tieed up and M*K dimension Battle array is multiplied, and the scrambled matrix of encrypted N*K dimension is obtained, by the modeling service end using the scrambled matrix as training sample Training machine learning model；

On the one hand, since modeling service end group is in the target projection matrix of the N*M objective matrix tieed up and M*K dimension The scrambled matrix obtained after multiplication can not usually restore original objective matrix, it is thus possible to the greatest extent to user's Private data carries out secret protection, avoids causing to use during data sample to be committed to modeling service end progress model training The privacy leakage at family；

On the other hand, since above-mentioned scrambled matrix is that the linear dimensionality reduction of objective matrix progress tieed up for the N*M maps The dimensionality reduction matrix of the N*K dimension arrived, therefore the transport overhead when transmitting data sample to modeling service end can be reduced；And And since above-mentioned target projection matrix is by instructing using above-mentioned objective matrix as original sample training collection by machine learning It gets out, therefore the encryption square that linear projection maps is carried out to the objective matrix based on the target projection matrix Battle array, can utmostly retain the information content in primary data sample, thus the scrambled matrix after dimensionality reduction is transmitted to modeling When server-side carries out model training, it still is able to guarantee the precision of model training.

Detailed description of the invention

Fig. 1 is the flow chart that this specification one implements a kind of data ciphering method exemplified；

Fig. 2 is the objective matrix schematic diagram that this specification one implements the N*M exemplified dimension；

Fig. 3 is the signal that a kind of fusion multiparty data sample that this specification one is implemented to exemplify carries out joint modeling；

Fig. 4 is the flow chart that this specification one implements a kind of machine learning model training method exemplified；

Fig. 5 is hardware knot involved in a kind of electronic equipment for carrying data encryption device of one embodiment of the application offer Composition；

Fig. 6 is a kind of logic diagram for data encryption device that one embodiment of the application provides；

Fig. 7 is involved by the electronic equipment for carrying a kind of machine learning model training device that one embodiment of the application provides Hardware structure diagram；

Fig. 8 is a kind of logic diagram for machine learning model training device that one embodiment of the application provides.

Specific embodiment

In big data era, by excavating to mass data, various forms of useful informations, therefore number can be obtained According to importance it is self-evident.Different mechanisms is owned by respective data, but the data mining effect of any mechanism, It will be limited to the data bulk and type that its own possesses.For this problem, a kind of direct resolving ideas is: more mechanisms It cooperates with each other, data is shared, to realize better data mining effect, realize win-win.

However for data owning side, data itself are a kind of up to much assets of tool, and for protection Privacy prevents demands, the data owning sides such as leakage to be often reluctant to directly provide data, and this situation leads to " number According to shared " it is difficult actual operation in reality.Therefore, how data sharing is realized under the premise of fully ensuring that data safety, Have become the problem of being concerned in industry.

In the present specification, it is intended to propose a kind of original user data progress required to modeling by machine learning algorithm To train the target projection matrix an of dimensionality reduction, and then the projection matrix can be used to original needed for modeling in machine learning Beginning user data is encrypted, and carries out secret protection to original user data, and utmostly retain in original user data Information content, so as under the premise of not sacrificing modeling accuracy, take into account the technical solution of the secret protection to user.

When realizing, the data characteristics of N number of dimension can be extracted respectively from the N data sample needed for modeling, and Based on the N data sample, and the data characteristics of M dimension of the N data sample is corresponded respectively to, generates a N*M The objective matrix of dimension is as sample training collection.

It, can be using the objective matrix that the N*M is tieed up as training sample to machine after generating the objective matrix of above-mentioned N*M dimension Learning model is trained；For example, when above-mentioned machine learning model has the machine learning model of supervision, the machine learning model Loss function in model parameter to be trained (vector of M dimension), can be expressed as by the linear transformation on basis The product of the object vector of the target projection matrix and K*1 dimension of the N*K dimension of one dimensionality reduction；In turn, it is tieed up by above-mentioned N*M Objective matrix above-mentioned machine learning model is trained after, can train a dimensionality reduction N*K dimension target projection square Battle array.

After obtaining the target projection matrix, the objective matrix that above-mentioned N*M can be tieed up based on the target projection matrix into The mapping of line dimensionality reduction is added by the target projection matrix multiple of the objective matrix for tieing up above-mentioned N*M and above-mentioned M*K dimension The scrambled matrix of N*K dimension after close, and the scrambled matrix is transmitted to modeling service end.

It above-mentioned modeling service end, can be using the scrambled matrix as training sample training machine after receiving the scrambled matrix Learning model；For example, above-mentioned scrambled matrix can be merged with its local training sample, it is then based on fused instruction Practice sample and carrys out training machine learning model.

It is described in detail below by specific embodiment and in conjunction with specific application scenarios.

Referring to FIG. 1, Fig. 1 is a kind of data ciphering method that one embodiment of this specification provides, it is applied to data and provides Square server-side executes following steps:

Step 102, based on the data of N data sample and M dimension for corresponding respectively to N data sample spy Sign generates the objective matrix of N*M dimension；

Step 104, the objective matrix based on N*M dimension is trained preset machine learning model, obtains M*K dimension Target projection matrix；Wherein, the K value is less than the M value；

Step 106, the target projection matrix multiple of the objective matrix N*M tieed up and M*K dimension, after obtaining encryption N*K dimension scrambled matrix；Wherein, the scrambled matrix is used for training machine learning model.

Above-mentioned data providing server-side can be docked with modeling service end, and Xiang Shangshu modeling service end provides and builds Data sample needed for mould；

For example, in practical applications, above-mentioned data providing and modeling side can respectively correspond different operators, data Provider can be transmitted to above-mentioned modeling side using collected user data as data sample to complete data modeling；For example, Above-mentioned modeling side can be the data operation platform of Al ipay, and above-mentioned data providing can be and transport with the data of Al ipay It seeks the user orienteds such as third party bank, the express company of platform docking and the service platform of Internet service is provided.Initial Under state, the server-side of data providing can collect the user data of the daily generation of user on backstage, and from be collected into this N user data of acquisition is generated at the beginning of one as data sample, and based on these collected data samples in a little user data The data sample set of beginningization.

For example, can acquire out N item from these user data being collected into a kind of embodiment shown and relate to And the sensitive data of privacy of user, it is then based on the data sample set that these sensitive datas generate an initialization.

Wherein, the particular number of collected above-mentioned N data sample, in the present specification without being particularly limited to, this Field technical staff can be configured based on actual demand.

The specific form of above-mentioned user data depends on specific business scenario and modeling requirement, in the present specification Also without being particularly limited to；For example, in practical applications, if it is desired to be created that one is used for Client-initiated payment transaction The scorecard model of risk assessment is carried out, then above-mentioned user data then can be logical based on user under this business scenario Cross the transaction data of payment client terminal generation.

After generating above-mentioned data sample set based on collected N data sample, above-mentioned data providing server-side Data sample in the data sample set can also be pre-processed.

Wherein, the data sample in above-mentioned data sample set is pre-processed, is generally included to above-mentioned data sample The preprocessing process that data sample in set carries out data cleansing, supplements default value, normalized or other forms.It is logical It crosses and the data sample in data sample set is pre-processed, collected data sample can be converted into being suitable for carrying out mould The standardized data sample of type training.

After the completion of to the data sample pretreatment in above-mentioned data sample set, above-mentioned data providing server-side can be with From each data sample in above-mentioned data sample set, the data characteristics of M dimension is extracted respectively；

Wherein, the quantity of the data characteristics of the above-mentioned M dimension extracted, in the present specification without being particularly limited to, Those skilled in the art can be selected based on actual modeling requirement.

In addition, the concrete type of the data characteristics extracted, in the present specification also without being particularly limited to, this field skill Art personnel can be based on actual modeling requirement, manually select from the information for being included practical in above-mentioned data sample；

For example, in one embodiment, above-mentioned modeling side can be based on actual modeling requirement, M dimension is preselected Then selected data characteristics is supplied to above-mentioned data providing by the data characteristics of degree, by above-mentioned data providing from above-mentioned Data characteristics value corresponding with the data characteristics of each dimension is extracted in data sample.

When data providing is extracted with each data sample in above-mentioned data sample set, M dimension is extracted respectively After the data characteristics of degree, can the corresponding data characteristics value of data characteristics based on the M dimension extracted, be each data sample This generates a data feature vector respectively, is then based on the data characteristics vector of each data sample, constructs a N*M dimension Objective matrix.

Wherein, when realizing, the data characteristics of above-mentioned M dimension can correspond to the row of above-mentioned objective matrix, can also correspond to In the column of above-mentioned objective matrix, in the present specification without being particularly limited to.

For example, referring to Fig. 2, the behavior example of above-mentioned objective matrix, above-mentioned target are corresponded to the data characteristics of above-mentioned M dimension Matrix can be expressed as form as shown in Figure 2.In objective matrix shown in Fig. 2, each column indicate a data sample, often A line indicates a feature vector being made of the data characteristics of M dimension.

When the server-side of above-mentioned data providing is based on N data sample, and the N data sample is corresponded respectively to The data characteristics of M dimension can adjust the objective matrix for tieing up above-mentioned N*M as original after generating the objective matrix that N*M is tieed up Sample training collection carries out machine learning model training, trains one for being encrypted to the above-mentioned N*M objective matrix tieed up Projection matrix.

In the present specification, above-mentioned machine learning model specifically can be the machine learning model of supervision；For example, above-mentioned Machine learning model specifically can be LR (Logi st ic Regress ion, logistic regression) model.

Wherein, the concrete type of above-mentioned machine learning model, in the present specification without being particularly limited to；For example, in reality In the application of border, above-mentioned machine learning model specifically can be is built based on the machine learning algorithm (such as regression algorithm) for having supervision The prediction model for having supervision；For example, the payment transaction data based on user, what is trained is used to assess the transaction risk of user Scorecard model.

In practical applications, there is the machine learning model of supervision, usually using loss function (Loss Funct ion), come Error of fitting between training sample and corresponding sample label is described.During actual model training, damage can use It loses function to be trained the above-mentioned N*M objective matrix tieed up, using training sample and corresponding sample label as input value, come anti- To the value of model parameter when solving the error of fitting minimum between training sample and corresponding sample label, and will solve The value of the model parameter out is as optimized parameter, to construct above-mentioned Logic Regression Models.Trained by loss function Logic Regression Models, it can be understood as reversely solved by loss function, between training sample and corresponding sample label Error of fitting minimum when model parameter value process.

Based on this, in the present specification, machine learning model is trained in the objective matrix tieed up based on above-mentioned N*M To above-mentioned M*K tie up target projection matrix when, specifically can be by be trained in the loss function by the machine learning model Model parameter ω is expressed as the product of above-mentioned target projection matrix H and a specific coefficient u, is then updated to above-mentioned loss In function, above-mentioned target projection matrix is solved by model training.I.e. in practical applications, can by above-mentioned wait train Model parameter ω carry out basis mathematic(al) manipulation, from derived from model parameter ω to be trained go out a target projection matrix.

For example, in a kind of embodiment shown, it is above-mentioned wait train when training sample set is the objective matrix of N*M Model parameter ω be usually a M dimension vector (one be made of weighted value corresponding with the data characteristics that M is tieed up Vector)；In such a case, it is possible to, by model parameter ω to be trained, indicate to become a M* by the linear transformation on basis The product of the object vector u of the target projection matrix H and K*1 dimension of K dimension (is by original i.e. for above-mentioned H and u Model parameter ω to be trained, the parameter being derived by the linear change on some bases), then it is updated to above-mentioned loss letter In several expression, and above-mentioned target projection matrix H is solved by model training process.

It is described in detail so that the above-mentioned machine learning model for having supervision is Logic Regression Models as an example below.

Wherein, it should be noted that only show so that the above-mentioned machine learning model for having supervision is Logic Regression Models as an example Example property；Obviously, in practical applications, above-mentioned machine learning model is specifically also possible to other classes other than Logic Regression Models The machine learning model of type.

In the present specification, above-mentioned N data sample can carry the sample label demarcated in advance respectively.Wherein, The specific form of sample label is also generally dependent on specific business scenario and modeling requirement, in the present specification also not into Row is particularly limited to；

For example, in practical applications, if it is desired to be created that one is used to carry out risk to Client-initiated payment transaction The scorecard model of assessment, then above-mentioned user data, which can be, passes through payment client based on user under this business scenario The transaction data generated is held, and above-mentioned sample label then specifically can be one and be used to indicate transaction data sample with the presence or absence of friendship The label of easy risk.

In this case, it is based on above-mentioned N data sample, and corresponds respectively to M dimension of above-mentioned N data sample Each of the objective matrix of above-mentioned N*M dimension of data characteristics building of degree data characteristics vector, can respectively correspond one Sample label.

In a kind of embodiment shown, the formula of the loss function of Logic Regression Models can be as follows:

Wherein, in above formula, ω indicates model parameter to be trained, usually the vector of M dimension；b_iIt indicates by institute State the feature vector that the data characteristics of M dimension of the i-th data sample in objective matrix generates；The target of i.e. above-mentioned N*M dimension The feature that the data characteristics of the M dimension by the i-th data sample in matrix, and corresponding to the i-th data sample generates Vector (for the row expression data characteristics of objective matrix as described above, above-mentioned b_iIndicate the i-th row feature in above-mentioned objective matrix Vector)；y_iExpression and b_iCorresponding sample label.Above formula expression solves and b_iWith y_iBetween error of fitting minimum It is worth corresponding optimized parameter ω.

Wherein, in practical applications, when using loss function to train Logic Regression Models, in order to avoid training The problem of model over-fitting out, it will usually introduce a norm regular terms for above-mentioned loss function.

For example, the formula of above-mentioned loss function can be as follows in the another embodiment shown:

Wherein, in above formula,The norm regular terms as introduced；λ indicates regular coefficient, usually originally The real number that field technical staff is arranged based on actual demand；|| ||_FIndicate norm type used by above-mentioned regular terms； In practical applications, norm type used by above-mentioned regular terms can be carried out by those skilled in the art based on actual demand Setting.

In the present specification, the model parameter ω to be learned in above-mentioned formula can be passed through basis by data providing Linear transformation indicate to become product (the i.e. matrix of M*K dimension and one of a target projection matrix H and an object vector u The object vector of K*1 dimension is multiplied the vector of available M*1 dimension), it is then updated in above-mentioned objective function.

Wherein, it should be noted that above-mentioned target projection matrix specifically can be the dimensionality reduction matrix of M*K dimension.Above-mentioned K Value by be less than above-mentioned M value, the value of the K specifically can by those skilled in the art be based on actual demand set It sets；For example, in one implementation, the value of above-mentioned K can specifically take the half or one third of M value.At this In the case of kind, the model parameter ω to be trained of above-mentioned M dimension can be expressed as a M*K dimension based on the linear transformation on basis Target projection square H and a K*1 dimension object vector u product.

When the linear transformation that model parameter ω to be learned is passed through to basis indicates to become a target projection matrix H and one After the product of a object vector u, H and u can be updated in above formula, the objective matrix for then tieing up above-mentioned N*M as Training sample, and the existing detailed process based on loss function training Logic Regression Models is combined to carry out model training, reversely Above-mentioned target projection matrix H is solved, for encrypting to initial data.

In the present specification, in the formula of above-mentioned loss function, regular terms can be carried, canonical can not also be carried.

In one embodiment, the formula for not carrying the loss function of regular terms can be as follows shown in formula 1:

H indicates above-mentioned target projection matrix in above formula；U indicates above-mentioned object vector；The meaning of other each parameters It repeats no more.

In another embodiment, the formula for carrying the loss function of regular terms can be as follows shown in formula 2:

In above formula, λ indicates the common regularization coefficient of H and u；|| ||_FIndicate model used by above-mentioned regular terms Several classes of types, the value of the M specifically can combine actual demand to be configured by those skilled in the art；For example, in a kind of feelings Under condition, the value of subscript F can be 2, i.e., the regular terms of above-mentioned loss function can use 2 norms (i.e. eucl id norm).

Certainly, in practical applications, above formula can also be deformed further, above-mentioned target projection matrix H and Above-mentioned object vector u can respectively correspond different regular terms, and set different regularization coefficients.

In this case, the formula for carrying the loss function of regular terms can be as follows shown in formula 3:

Wherein, in above formula, the corresponding value of λ and Υ is different；Also, it is corresponding with above-mentioned target projection matrix H Norm type used by norm type used by regular terms, and regular terms corresponding from above-mentioned object vector can be different (value of subscript F can be different in regular terms i.e. corresponding from H and u).

For example, in one implementation, norm type used by regular terms corresponding with above-mentioned target projection matrix H It can be f norm (frobenius norm norm) that norm type used by regular terms corresponding with above-mentioned object vector can be with For 2 norms.

In this case, above-mentioned formula can be expressed as:

Wherein, it should be noted that in practical applications, the concrete type of above-mentioned loss function, in the present specification not It is particularly limited；

For example, above-mentioned loss function can specifically use logarithm loss function in a kind of embodiment shown.At this In the case of kind, in formula 1-3 illustrated above, above-mentioned loss (b_iHu,y_i) form that can specifically be expressed as:

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

Certainly, in practical applications, other than logarithm loss function, other types of loss function can also be used (such as figure penalties function), in the present specification without being particularly limited to, those skilled in the art can be based on actual need It asks and is selected.

In the present specification, when data providing is based on N data sample and corresponds respectively to the N data sample M dimension data characteristics generate the objective matrix of above-mentioned N*M dimension after, appointing in formula 1-3 illustrated above can be called A formula anticipate to train Logic Regression Models, solves expression and solves and b_iWith y_iBetween error of fitting minimum value pair The optimized parameter H answered.

Wherein, when calling formula illustrated above to train Logic Regression Models, logistic regression mould can specifically be used The traditional gradient descent method of type solves optimized parameter H, specific training process, in this theory to complete the training process of model It is no longer described in detail in bright book, those skilled in the art can refer to when the technical solution in this specification is put into effect Record in the related technology.

In the present specification, after finally training above-mentioned target projection matrix H by model training, the throwing of M*K dimension Shadow matrix, as eventually for the matrix encrypted to original objective matrix.It is right in the projection matrix tieed up based on the M*K When the objective matrix of original N*M is encrypted, specifically can based on the M*K tie up dimensionality reduction after target projection matrix, to original The objective matrix of the high-dimensional N*M dimension to begin carries out linear projection mapping, and the objective matrix of the N*M is mapped to M*K dimension In the projection matrix space of low dimensional.

When realizing, the objective matrix of above-mentioned original N*M is mapped to the projection matrix space of above-mentioned M*K dimension, specifically It can be realized by the way that the objective matrix of above-mentioned original N*M is multiplied with the above-mentioned M*K projection matrix tieed up；Wherein, above-mentioned original The objective matrix of the N*M of beginning is multiplied with the above-mentioned M*K projection matrix tieed up, and can be multiplied using the right side or using premultiplication；

Such as, it is assumed that it, can be by by above-mentioned original when column of the data characteristics of above-mentioned M dimension as above-mentioned objective matrix The objective matrix of the N*M of beginning and the projection matrix of above-mentioned M*K dimension carry out the right side and multiply calculating, by the objective matrix of above-mentioned original N*M It is mapped to the projection matrix space of above-mentioned M*K dimension；It so, can also be by by the target square of above-mentioned original N*M when realizing The projection matrix of battle array and above-mentioned M*K dimension carries out premultiplication, then transposition is carried out to premultiplication result, by the target of above-mentioned original N*M Matrix is mapped to the projection matrix space of above-mentioned M*K dimension.

After the objective matrix of above-mentioned original N*M to be mapped to the projection matrix space of above-mentioned M*K dimension, it can obtain at this time The scrambled matrix tieed up to a N*K.At this point, mapping matrix progress of the scrambled matrix as by above-mentioned M*K dimension is encrypted Data sample.

Wherein, in a kind of embodiment shown, above-mentioned data providing server-side is passing through calculating illustrated above Process after calculating the projection matrix that above-mentioned M*K is tieed up, can also be deposited using the projection matrix as scrambled matrix locally Storage.

Subsequent above-mentioned data providing server-side is collected into newest N data data sample again, and is based on the N item Data sample and correspond respectively to the N data sample M dimension data characteristics generate N*M dimension objective matrix Afterwards, it can be determined that local whether to store above-mentioned projection matrix；

If above-mentioned projection matrix has been locally stored, stored above-mentioned projection matrix can be directly used, to above-mentioned The matrix of N*M is encrypted, and specific ciphering process repeats no more.

It certainly, can above-mentioned target projection square as described above if local not stored above-mentioned projection matrix The training learning process of battle array H, relearns out the target projection matrix H.

In addition, it is necessary to explanation, in practical applications, if the dimension of the data characteristics of above-mentioned M dimension becomes Change (for example increasing the data characteristics of new dimension, perhaps deleted the data characteristics of part of dimension) or above-mentioned M The meaning that the data characteristics of all or part of dimension in the data characteristics of dimension is characterized changes, then above-mentioned at this time Data providing can relearn out the target based on the training learning process of above-mentioned target projection matrix H described above Projection matrix H, and the local projection matrix that should be stored is updated using the projection matrix recalculated.

It in this way, can be when the data characteristics needed for modeling updates, timely to being locally stored The scrambled matrix of failure is updated, so as to avoid adding original objective matrix using failed scrambled matrix It is close, and caused by data information amount loss influence modeling accuracy.

In the present specification, when the encryption square for according to training learning process illustrated above, having obtained encrypted N*K dimension After battle array, above-mentioned data providing server-side can be transmitted to and above-mentioned data providing using the scrambled matrix as training sample The modeling service end of docking.

And modeling service end is after the above-mentioned scrambled matrix for receiving the transmission of above-mentioned data providing server-side, modeling service end It can be using the scrambled matrix as training sample training machine learning model；

Wherein, in a kind of embodiment shown, above-mentioned modeling service end specifically can be by above-mentioned scrambled matrix, with this The training sample of ground storage is merged, and is then based on fused training sample, is carried out joint training machine learning model.

Fig. 3 is referred to, Fig. 3 carries out the signal of joint modeling for a kind of fusion multiparty data sample shown in this specification Figure.

In a scenario, above-mentioned modeling side can be the data operation platform of Al ipay, and above-mentioned data providing can To include that the user orienteds such as the bank docked with the data operation platform of Al ipay, third party financial institution provide interconnection Net the service platform of service.

It in practical applications, is one non-since the data operation platform of Al ipay is for above-mentioned data providing The third party of credit, therefore above-mentioned data providing directly transports the data that local customer transaction data are supplied to Al ipay Seek the problem of platform carries out data modeling, privacy of user may be caused to reveal in data transmission link.

In this case, each data providing can be based on the public affairs of any one in formula 1- formula 3 illustrated above Formula is trained study by the objective matrix to the N*M dimension based on original transaction data sample generation, obtains a M*K The projection matrix of dimension, the projection matrix then tieed up using the M*K are carried out dimensionality reduction encryption to the objective matrix of above-mentioned N*M dimension, obtained The scrambled matrix of one N*K dimension, and it is transferred to using the scrambled matrix as training sample the data operation platform of Al ipay.

And the training sample provided by each data providing that the data operation platform of Al ipay can will receive, It is merged with the data sample of localization, is then based on fused training sample and carrys out training machine learning model；For example, base In the customer transaction data that bank and third party financial institution provide, and localized in the data operation platform of Al ipay Customer transaction data are merged, and joint training goes out the scorecard model that risk assessment is carried out for the transaction to user.

It is subsequent above-mentioned after training above-mentioned machine learning model when above-mentioned modeling side is based on modeling pattern illustrated above Data providing still can be based on above-mentioned target projection matrix H, to special based on collected data sample and related data The data matrix of sign building carries out dimensionality reduction encryption, is then transmit to above-mentioned machine learning model and is calculated, obtains the defeated of model Result out；For example, what is trained is used to assess user's with above-mentioned machine learning model for the payment transaction data based on user For the scorecard model of transaction risk, above-mentioned data providing can be based on above-mentioned projection matrix, to based on collected use The data matrix of the transaction data building at family carries out dimensionality reduction encryption, is then transmitted to above-mentioned scorecard model as input data, Obtain risk score corresponding with every transaction.

A kind of data ciphering method provided above for this specification embodiment, refers to Fig. 4, is based on same thinking, This specification embodiment also provides a kind of machine learning model training method, is applied to modeling service end, executes following steps:

Step 402, the scrambled matrix of data providing server-side transmission is received；Wherein, the scrambled matrix is the number Preset machine learning model is trained based on the objective matrix that N*M is tieed up according to provider's server-side, the mesh of obtained M*K dimension Mark projection matrix；The K value is less than the M value；

Step 404, using the scrambled matrix as training sample training machine learning model.

Wherein, the corresponding implementation process of technical characteristic in each step shown in Fig. 4, it is no longer superfluous in the present embodiment It states, is referred to the record of above embodiments.

As can be seen from the above embodiments data providing is by being based on N data sample and corresponding respectively to the N The objective matrix for the N*M dimension that the data characteristics of M dimension of data sample generates, instructs preset machine learning model Practice, trains the target projection matrix of the N*K dimension an of dimensionality reduction, and the mesh of the objective matrix that the N*M is tieed up and M*K dimension Mark projection matrix to be multiplied, obtain the scrambled matrix of encrypted N*K dimension, by the modeling service end using the scrambled matrix as Training sample training machine learning model；

On the other hand, since above-mentioned scrambled matrix is that the linear dimensionality reduction of objective matrix progress tieed up for the N*M maps The dimensionality reduction matrix of the N*K dimension arrived, therefore the transport overhead when transmitting data sample to modeling service end can be reduced；

Moreover, because above-mentioned target projection matrix is by passing through above-mentioned objective matrix as original sample training collection Machine learning training obtains, therefore carries out what linear projection mapped to the objective matrix based on the target projection matrix The scrambled matrix can utmostly retain the information content in primary data sample, thus by the scrambled matrix after dimensionality reduction When being transmitted to modeling service end progress model training, it still is able to guarantee the precision of model training.

Corresponding with above method embodiment, this specification additionally provides a kind of embodiment of data encryption device.This theory The embodiment of the data encryption device of bright book can be using on an electronic device.Installation practice can by software realization, It can be realized by way of hardware or software and hardware combining.Taking software implementation as an example, as the device on a logical meaning, It is that computer program instructions corresponding in nonvolatile memory are read by memory by the processor of electronic equipment where it What middle operation was formed.For hardware view, as shown in figure 5, for one of electronic equipment where the data encryption device of the application Kind hardware structure diagram, other than processor shown in fig. 5, memory, network interface and nonvolatile memory, embodiment Electronic equipment where middle device can also include other hardware, no longer to this generally according to the actual functional capability of the electronic equipment It repeats.

Fig. 6 is a kind of block diagram of data encryption device shown in one exemplary embodiment of the application.

Referring to FIG. 6, the data encryption device 60 can be applied in aforementioned electronic equipment shown in fig. 5, include: Generation module 601, training module 602, encrypting module 603.

Wherein, generation module 601 based on N data sample and correspond respectively to M of N data sample dimension The data characteristics of degree generates the objective matrix of N*M dimension；

First training module 602, the objective matrix based on N*M dimension are trained preset machine learning model, Obtain the target projection matrix of M*K dimension；Wherein, the K value is less than the M value；

Encrypting module 603, the target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension, is added The scrambled matrix of N*K dimension after close；Wherein, the scrambled matrix is used for training machine learning model.

In the present embodiment, the preset machine learning model is to have the machine learning model of supervision；Wherein, described pre- If machine learning model loss function in training pattern parameter, be expressed as target projection matrix and object vector Product.

In the present embodiment, wherein the machine learning model is Logic Regression Models.

In the present embodiment, the expression formula of the loss function is any in following formula:

In the present embodiment, the loss function is logarithm loss function；Wherein,

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

Corresponding with above method embodiment, this specification additionally provides a kind of implementation of machine learning model training device Example.The embodiment of the machine learning model training device of this specification can be using on an electronic device.Installation practice can be with By software realization, can also be realized by way of hardware or software and hardware combining.Taking software implementation as an example, it is patrolled as one Device in volume meaning is by the processor of electronic equipment where it by computer program corresponding in nonvolatile memory Instruction is read into memory what operation was formed.For hardware view, as shown in fig. 7, the machine learning model for the application is instructed Practice device where electronic equipment a kind of hardware structure diagram, in addition to processor shown in Fig. 7, memory, network interface and it is non-easily Except the property lost memory, electronic equipment in embodiment where device, can be with generally according to the actual functional capability of the electronic equipment Including other hardware, this is repeated no more.

Fig. 8 is a kind of block diagram of machine learning model training device shown in one exemplary embodiment of the application.

Referring to FIG. 8, the machine learning model training device 80 can be applied in aforementioned electronic equipment shown in Fig. 7 In, include: receiving module 801 and the second training module 802.

Wherein, receiving module 801 receive the scrambled matrix of data providing server-side transmission；Wherein, the scrambled matrix Preset machine learning model is trained based on the objective matrix that N*M is tieed up for the data providing server-side, is obtained The target projection matrix of M*K dimension；The K value is less than the M value；

Second training module 802, using the scrambled matrix as training sample training machine learning model.

In the present embodiment, second training module 802 further,

The function of modules and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus Realization process, details are not described herein.

For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual The purpose for needing to select some or all of the modules therein to realize application scheme.Those of ordinary skill in the art are not paying Out in the case where creative work, it can understand and implement.

System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment The combination of any several equipment.

Corresponding with above method embodiment, this specification also provides a kind of implementation of machine learning model training system Example.

The machine learning model training system may include data providing server-side and modeling service end.

Wherein, above-mentioned data providing server-side based on N data sample and corresponds respectively to the N data sample The data characteristics of this M dimension generates the objective matrix of N*M dimension；Objective matrix based on N*M dimension is to preset machine Learning model is trained, and obtains the target projection matrix of M*K dimension；Wherein, the K value is less than the M value；The N*M is tieed up Objective matrix and M*K dimension target projection matrix multiple, obtain the scrambled matrix of encrypted N*K dimension；

Above-mentioned modeling service end is based on the scrambled matrix training machine learning model.

In the present embodiment, the modeling service end further,

Corresponding with above method embodiment, present invention also provides the embodiments of a kind of electronic equipment.The electronic equipment It include: processor and the memory for storing machine-executable instruction；Wherein, processor and memory usually pass through inside Bus is connected with each other.In other possible implementations, the equipment is also possible that external interface, can set with other Standby or component is communicated.

In the present embodiment, the control with the data encryption as shown in 1 stored by reading and executing the memory The corresponding machine-executable instruction of logic processed, the processor are prompted to:

In the present embodiment, the preset machine learning model is to have the machine learning model of supervision；Wherein, described pre- If machine learning model loss function in training pattern parameter, be expressed as the target projection matrix and target to The product of amount.

In the present embodiment, the preset machine learning model is Logic Regression Models.

In the present embodiment, it is patrolled by reading and executing the control of the recovery with database corruption of the memory storage Corresponding machine-executable instruction is collected, the processor is prompted to:

Model training is carried out to the objective matrix that the N*M is tieed up based on any one loss function in following formula:

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

In the present embodiment, by reading and executing memory storage and the machine learning model as shown in 4 The corresponding machine-executable instruction of trained control logic, the processor are prompted to:

Using the scrambled matrix as training sample training machine learning model.

In the present embodiment, by reading and executing memory storage and the machine learning model as shown in 4 The corresponding machine-executable instruction of trained control logic, the processor are also prompted to:

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.

It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.

It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.

The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims

1. a kind of data ciphering method, which comprises

N*M dimension is generated based on the data characteristics of N data sample and M dimension for corresponding respectively to the N data sample Objective matrix；

Objective matrix based on N*M dimension is trained preset machine learning model, obtains the target projection square of M*K dimension Battle array；Wherein, the K value is less than the M value；

The target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension obtains adding for encrypted N*K dimension Close matrix；Wherein, the scrambled matrix is used for training machine learning model.

2. according to the method described in claim 1, the preset machine learning model is to have the machine learning model of supervision；Its In, in the loss function of the preset machine learning model to training pattern parameter, be expressed as the target projection square The product of battle array and object vector.

3. according to the method described in claim 2, wherein, the preset machine learning model is Logic Regression Models.

4. according to the method described in claim 3, the expression formula of the loss function is any in following formula:

Wherein, H indicates the projection matrix；U indicates the object vector；‖‖_FIndicate the canonical of the regular terms of the loss function Change norm；λ and γ indicates regular coefficient；b_iIndicate special by the data of M dimension of the i-th data sample in the objective matrix Levy the feature vector generated；y_iIndicate label corresponding with described eigenvector.

5. according to the method described in claim 4, the loss function is logarithm loss function；Wherein,

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

6. a kind of data encryption device, described device include:

Generation module, the data characteristics based on N data sample and M dimension for corresponding respectively to the N data sample Generate the objective matrix of N*M dimension；

First training module, the objective matrix based on N*M dimension are trained preset machine learning model, obtain M*K The target projection matrix of dimension；Wherein, the K value is less than the M value；

Encrypting module, the target projection matrix multiple of the objective matrix that the N*M is tieed up and M*K dimension, obtains encrypted The scrambled matrix of N*K dimension；Wherein, the scrambled matrix is used for training machine learning model.

7. device according to claim 6, the preset machine learning model is to have the machine learning model of supervision；Its In, in the loss function of the preset machine learning model to training pattern parameter, be expressed as target projection matrix and The product of object vector.

8. device according to claim 7, wherein the machine learning model is Logic Regression Models.

9. device according to claim 8, the expression formula of the loss function is any in following formula:

Wherein, H indicates the projection matrix；U indicates the object vector；‖‖_FIndicate the canonical of the regular terms of the loss function Change norm；λ and Υ indicates regular coefficient；b_iIndicate special by the data of M dimension of the i-th data sample in the objective matrix Levy the feature vector generated；y_iIndicate label corresponding with described eigenvector.

10. device according to claim 9, the loss function is logarithm loss function；Wherein,

loss(b_iHu,y_i)=log (1+exp (- y_ib_iHu))。

11. a kind of machine learning model training method, which comprises

Receive the scrambled matrix of data providing server-side transmission；Wherein, the scrambled matrix is the data providing service End group is trained preset machine learning model in the objective matrix that N*M is tieed up, the target projection matrix of obtained M*K dimension； The K value is less than the M value；

Using the scrambled matrix as training sample training machine learning model.

12. according to the method for claim 11, described learn mould for the scrambled matrix as training sample training machine Type, comprising:

It using the scrambled matrix as training sample, is merged with local training sample, and is based on fused trained sample This training machine learning model.

13. a kind of machine learning model training device, described device include:

Receiving module receives the scrambled matrix of data providing server-side transmission；Wherein, the scrambled matrix is that the data mention Supplier's server-side is trained preset machine learning model based on the objective matrix that N*M is tieed up, and the target of obtained M*K dimension is thrown Shadow matrix；The K value is less than the M value；

14. according to the method for claim 13, second training module further,

15. a kind of machine learning model training system, the system comprises:

Data providing server-side, based on N data sample and M dimension for corresponding respectively to the N data sample Data characteristics generates the objective matrix of N*M dimension；Objective matrix based on N*M dimension instructs preset machine learning model Practice, obtains the target projection matrix of M*K dimension；Wherein, the K value is less than the M value；By the N*M objective matrix tieed up and institute The target projection matrix multiple for stating M*K dimension obtains the scrambled matrix of encrypted N*K dimension；

16. system according to claim 15, the modeling service end further,

17. a kind of electronic equipment, comprising:

Processor；

For storing the memory of machine-executable instruction；

Wherein, referred to by reading and executing the machine corresponding with the control logic of data encryption of the memory storage and can be performed It enables, the processor is prompted to:

18. a kind of electronic equipment, comprising:

Processor；

For storing the memory of machine-executable instruction；

Wherein, the machine corresponding with the control logic of machine learning model training stored by reading and executing the memory Executable instruction, the processor are prompted to:

Using the scrambled matrix as training sample training machine learning model.