CN113515770A

CN113515770A - Method and device for determining target business model based on privacy protection

Info

Publication number: CN113515770A
Application number: CN202010626329.4A
Authority: CN
Inventors: 熊涛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2021-10-19
Also published as: CN111177792A; WO2021204272A1; CN111177792B; TWI769754B; TW202139045A

Abstract

The embodiment of the specification provides a method and a device for determining a target business model based on privacy protection. And determining each sampling probability corresponding to each service model by using corresponding model indexes through an index mechanism of differential privacy for the obtained plurality of sub-models, and sampling the plurality of sub-models based on the sampling probabilities so as to select the target service model. Therefore, a compression model with privacy protection can be obtained by using a differential privacy mode, and privacy protection is provided for the model on the basis of realizing model compression.

Description

Method and device for determining target business model based on privacy protection

The present application is a divisional application of the invention patent application No. 202010276685.8 entitled "method and apparatus for determining target business model based on privacy protection", filed on 10/4/2020.

Technical Field

One or more embodiments of the present specification relate to the field of computer technology, and more particularly, to a method and apparatus for determining a target business model based on privacy protection by a computer.

Background

With the development of machine learning technology, Deep Neural Networks (DNNs) are favored by those skilled in the art because they mimic the thinking of the human brain and have better effect than simple linear models. The deep neural network is a neural network with at least one hidden layer, and can be used for modeling a complex nonlinear system and improving the model capability.

Deep neural networks are also very large with complex network structures, feature and model parameter systems. For example, a deep neural network may include up to millions of parameters. Therefore, it is desirable to find ways to compress models that reduce the data size and complexity of the models. To this end, it is common in conventional techniques to adjust millions of parameters in a deep neural network using training samples, and then to delete or "prune" unnecessary weights to reduce the network structure to a more manageable size. Reducing the size of the model helps to minimize its memory, reasoning and computational requirements. In some traffic scenarios, many of the weights in a neural network can sometimes be cut down by as much as 99%, resulting in a smaller, more sparse network.

However, this method of pruning after the training is completed requires a high computational cost and performs a large number of "invalid" computations. It is then envisaged to train in a sub-network of the original neural network to find a network that meets the requirements as much as possible. Meanwhile, in the conventional technology, a simpler neural network is easier to acquire raw data. Therefore, a method is needed to be provided, which can not only protect the privacy of data, but also compress the size of the model to realize real-time calculation and end-to-end deployment, thereby improving the performance of the model from multiple aspects.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for determining a target business model based on privacy protection to solve one or more of the problems mentioned in the background.

According to a first aspect, a method for determining a target service model based on privacy protection is provided, the target service model being configured to process given service data to obtain a corresponding service prediction result; the method comprises the following steps: determining initial values corresponding to the model parameters for the selected service model according to a preset mode, so as to initialize the selected service model; training the initialized selected business model by using a plurality of training samples until model parameters are converged to obtain an initial business model; determining a plurality of sub-models of the initial business model based on the pruning of the initial business model, wherein each sub-model respectively corresponds to a model parameter and a model index determined by retraining in the following mode: resetting the model parameters of the trimmed service model to initial values of corresponding model parameters in the initialized service model; sequentially inputting a plurality of training samples into the trimmed service model, and adjusting model parameters based on comparison between corresponding sample labels and output results of the trimmed service model; and selecting a target business model from each submodel by utilizing a first mode of differential privacy based on the model indexes corresponding to each submodel.

In one embodiment, the determining a plurality of sub-models of the initial business model based on the pruning of the initial business model comprises: pruning the initial business model according to the model parameters of the initial business model to obtain a first pruning model; taking the first pruning model corresponding to the model parameters obtained through retraining as a first sub-model; and iteratively trimming the first submodel to obtain a subsequent submodel until an end condition is met.

In one embodiment, the end condition includes at least one of the number of iterations reaching a predetermined number, the number of submodels reaching a predetermined number, and the size of the last submodel being less than a set size threshold.

In one embodiment, the model is pruned in a manner based on one of the following, in order of increasing model parameters: and trimming off model parameters in a preset proportion, trimming off a preset number of model parameters, and trimming to obtain a model with the scale not exceeding a preset size.

In an embodiment, the first mode of differential privacy is an exponential mechanism, and selecting the target service model from each sub-model by using the first mode of differential privacy based on the model index corresponding to each sub-model includes: determining each availability coefficient corresponding to each submodel according to the model index corresponding to each submodel; determining each sampling probability corresponding to each sub-model by using an index mechanism according to each availability coefficient; and sampling in the plurality of sub-models according to the sampling probabilities, and taking the sampled sub-models as target business models.

In one embodiment, the method further comprises: and training the target business model based on a second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for carrying out business prediction for protecting data privacy aiming at given business data.

In one embodiment, the training samples include a first batch of samples, where a sample i in the first batch of samples corresponds to a loss obtained after the target business model is processed, and the training of the target business model based on the second differential privacy mode using the training samples includes: determining an original gradient of a loss corresponding to the sample i; adding noise on the original gradient by utilizing the second mode of differential privacy to obtain a gradient containing the noise; and adjusting the model parameters of the target business model by using the gradient containing the noise and taking the minimum loss corresponding to the sample i as a target.

In one embodiment, the second way of differential privacy is to add gaussian noise, and the adding noise to the original gradient by using the second way of differential privacy to obtain a gradient containing noise includes: based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient; determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.

In one embodiment, the service data comprises at least one of pictures, audio, characters.

According to a second aspect, there is provided an apparatus for determining a target service model based on privacy protection, the target service model being configured to process given service data to obtain a corresponding service prediction result; the device comprises:

the initialization unit is configured to determine initial values corresponding to the model parameters for the selected business model according to a preset mode, so as to initialize the selected business model;

an initial training unit configured to train the initialized selected service model using a plurality of training samples until model parameters converge to obtain an initial service model;

a pruning unit configured to determine a plurality of sub models of the initial business model based on pruning of the initial business model, wherein each sub model corresponds to a model parameter and a model index determined by the initialization unit and the initial training unit through retraining: the initialization unit resets the model parameters of the trimmed service model to initial values of corresponding model parameters in the initialized service model; the initial training unit inputs a plurality of training samples into the trimmed service model in sequence, and adjusts model parameters based on comparison of corresponding sample labels and output results of the trimmed service model;

and the determining unit is configured to select a target business model from each submodel by using a first mode of differential privacy based on the model indexes corresponding to each submodel.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

By the method and the device provided by the embodiment of the specification, the selected complex business model is initially trained to obtain an initial business model, then the initial business model is pruned, and the pruned business model is trained under the condition that the parameters are reset to the initialization state, so that whether the pruned model parameters are unimportant from beginning is tested. And selecting a target business model from the obtained multiple sub-models in a differential privacy mode. Therefore, a compression model with privacy protection can be obtained, and privacy protection is provided for the model on the basis of realizing model compression.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation architecture for determining a target business model based on privacy protection in the technical concepts of the present specification;

FIG. 2 illustrates a flow of determining a plurality of sub-networks based on pruning of an initial neural network in a specific example;

FIG. 3 illustrates a flow diagram of a method for determining a target business model based on privacy protection, according to one embodiment;

FIG. 4 illustrates a schematic diagram of neural network pruning in a specific example;

FIG. 5 illustrates a schematic block diagram of an apparatus for determining a target business model based on privacy protection, according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Fig. 1 shows a schematic diagram of an implementation architecture according to the technical concept of the present specification. Under the technical idea of the present specification, the business model may be a machine learning model for performing various business processes such as classification, scoring, and the like on business data. The business model shown in fig. 1 is implemented by a neural network, and in practice, it can be implemented by other means, such as a decision tree, linear regression, and so on. The service data may be at least one of various manners such as characters, audio, images, and animation, and is determined according to a specific service scenario, which is not limited herein.

For example, the business model can be a machine learning model used for the lending platform to assist in evaluating the lending business risk of the user, the targeted business data can be historical lending behavior data, default data, user figures and the like of the individual user, and the business prediction result is the risk score of the user. For another example, the traffic model may also be a model (such as a convolutional neural network) for classifying objects on the picture, the targeted traffic data may be various pictures, and the traffic prediction result may be, for example, a first object (such as a car), a second object (bicycle), other categories, and the like.

In particular, the present specification implementation architecture is particularly applicable to the case where the business model is a more complex non-linear model. The process of determining the target business model based on the privacy protection can be a process of determining a simplified sub-model with model indexes meeting requirements from a complex initial business model.

Taking the business model as a neural network as an example, as shown in fig. 1, the initial neural network may be a more complex neural network, and the neural network may include more features, weight parameters, other parameters (such as constant parameters, auxiliary matrices), and the like. The model parameters of the initial neural network may be initialized in a predetermined manner, e.g., randomly, set to predetermined values, etc. Under the implementation architecture, the initial neural network is trained through a plurality of training samples until the model parameters (or loss functions) of the initial neural network converge. And then, pruning the initial neural network to obtain a plurality of sub-networks. In pruning the neural network, the pruning may be performed according to a predetermined parameter ratio (e.g., 20%), a predetermined parameter number (e.g., 1000), a predetermined size (e.g., at least 20 megabytes), and so on.

Conventionally, the pruned sub-network of the initial neural network is usually continued training, pruned again on the basis of the continued training, and continued training. That is, it is a step-by-step compression process for the initial neural network. Under the concept of the embodiments of the present specification, after the initial neural network is pruned, the parameters of the pruned sub-network are reset (the initialized state is restored), and the pruned network after the parameters are reset is trained. The purpose of this is to be able to verify whether the pruned neural network structure is not needed from the beginning. Such a conclusion whether it is unnecessary from the beginning can be expressed by evaluation indexes of the model, such as accuracy, recall rate, convergence, and the like.

It is worth mentioning that the pruning of the neural network may include a process of removing a part of neurons in the neural network and/or removing a part of connection relations in the neurons. In an alternative implementation, which neurons are discarded may be referred to by the weight parameter corresponding to the neuron. The weight parameter describes the importance of the neuron, and taking a fully connected neural network as an example, the weights corresponding to each neuron mapped to the next layer from one neuron may be averaged, or the maximum value may be taken, so as to obtain the reference weight. The discarding (pruning) is further performed in the order of the reference weights of the respective neurons from small to large.

As shown in fig. 2, a sub-network pruning flow of a next specific example of the implementation architecture of the present specification is given. In fig. 2, for the remaining part of the neural network after pruning, the model parameters are reset to the initialization state and retrained with training samples, resulting in the first sub-network. Meanwhile, the network structure, evaluation index, etc. of the first subnetwork may be recorded. Then, the step of entering the trim model starts a loop as indicated by the left arrow. And pruning the first sub-network according to the trained model parameters in the first sub-network, resetting the model parameters of the pruned neural network to an initialization state, and retraining the neural network by using training samples to serve as a second sub-network. And continuing to cycle along the left arrow, and so on until an Nth sub-network meeting the end condition is obtained. The ending condition here may be, for example, at least one of the number of iterations reaching a predetermined number (e.g., a preset number N), the number of submodels reaching a predetermined number (e.g., a preset number N), the size of the last submodel being smaller than a set size threshold (e.g., 100 megabytes, etc.), and the like.

In this way, multiple sub-networks of the initial neural network may be obtained. In some alternative embodiments, the arrow on the left side of fig. 2 may go back to the top, that is, after the first sub-network is obtained, the original neural network is reinitialized, the reinitialized neural network is trained, and pruned, the pruned sub-network is trained as the second sub-network, and so on until the nth sub-network is obtained. Where the sub-networks may be of different sizes, e.g., a first sub-network being 80% of the original neural network, a second sub-network being 60% of the original neural network, and so on. In this way, each time the neural network is initialized, some randomization may be performed, that is, each time random sampling is performed on the features or neurons, a small portion (e.g., 1%) of the features and initialization parameters are discarded to cause small perturbation to the initial neural network, and the neural network is initialized each time to be consistent with the initial neural network with small differences to test different neuron actions.

Continuing with the description of fig. 1. For each sub-network, one sub-network can be selected as the target neural network. According to one embodiment, in order to protect data privacy, each pruned sub-network can be regarded as a sub-network set of an initial neural network, and one sub-network is randomly selected as a target neural network based on a differential privacy principle. Therefore, the target business model is determined based on privacy protection in a differential privacy mode, the business model and/or business data privacy can be better protected, and the practicability of the target neural network is improved.

It is understood that the implementation architecture shown in fig. 1 is exemplified by the business model being a neural network, and when the business model is other machine learning models, the neurons in the above description may be replaced by other model elements, for example, when the business model is a decision tree, the neurons may be replaced by tree nodes in the decision tree, and so on.

The target neural network is used for carrying out service prediction on the service data to obtain a corresponding service prediction result. For example, a business prediction result of the identified target category is obtained according to the picture data, a business prediction result of the financial loan risk of the user is obtained according to the user behavior data, and the like.

The specific process of determining the target business model based on privacy protection is described in detail below.

FIG. 3 illustrates a flow of determining a target business model based on privacy protection, according to one embodiment. The business model here may be a model for conducting business processing such as classification, scoring, etc. for given business data. The service data may be various types of data such as text, image, voice, video, animation, etc. The execution subject of the flow may be a system, device, apparatus, platform, or server with certain computing capabilities.

As shown in fig. 3, a method for determining a target business model based on privacy protection may include the steps of: step 301, determining initial values corresponding to the model parameters for the selected service model according to a predetermined mode, thereby initializing the selected service model; step 302, training the initialized service model by using a plurality of training samples until the model parameters are converged to obtain an initial service model; step 303, determining a plurality of submodels of the initial business model based on pruning of the initial business model, wherein each submodel corresponds to a model parameter and a model index determined by retraining in the following manner: resetting the model parameters of the trimmed service model to initial values of corresponding model parameters in the initialized service model; sequentially inputting a plurality of training samples into the trimmed service model, and adjusting model parameters based on comparison between corresponding sample labels and output results of the trimmed service model; and 304, selecting a target service model from each submodel by using a first mode of differential privacy based on the model indexes corresponding to each submodel.

First, in step 301, initial values corresponding to model parameters are determined for a selected service model according to a predetermined manner, so as to initialize the selected service model.

It will be appreciated that for a selected business model, in order to be able to train the model, the model parameters need to be initialized first. I.e. initial values are determined for the respective model parameters. When the selected business model is a neural network, the model parameters may be, for example, at least one of weights of individual neurons, constant parameters, auxiliary matrices, and the like. When the selected business model is a decision tree, the model parameters are, for example, weight parameters of each node, connection relationships between nodes, connection weights, and the like. Where the selected business model is another form of machine learning model, the model parameters may also be other parameters, not to mention here.

The initial values of these model parameters may be determined in a predetermined manner, such as a completely random value, a random value within a predetermined interval, a set value, and so on. With these initial values, when receiving the service data or extracting the relevant features according to the service data, the service model can give the corresponding service prediction results, such as classification results, scoring results, etc.

Next, in step 302, the initialized service model is trained using a plurality of training samples until the model parameters converge, resulting in an initial service model.

After the initialization of the model parameters in step 301, once the service data is received, the selected service model may be operated according to the corresponding logic to give the corresponding service prediction result, so that the initialized service model may be trained by using the training samples. Each training sample may correspond to sample traffic data, and a corresponding sample label. The training process for the initialized business model may be, for example: and sequentially inputting the service data of each sample into the initialized service model, and adjusting the parameters of the model according to the comparison between the service prediction result output by the service model and the corresponding service label.

After a certain number of training samples are adjusted, each model parameter of the business model changes less and less until the model parameter approaches a certain value. I.e. the model parameters converge. The model parameter convergence can be described by the fluctuation values of the respective model parameters, and also by the loss function. This is because the loss function is typically a function of the model parameters, which when converged, represents the convergence of the model parameters. The convergence of the model parameters may be determined, for example, when the maximum variation value of the loss function or the fluctuation of the model parameters is less than a predetermined threshold value. The selected business model completes the training of the current stage, and the obtained business model can be called as an initial business model.

The initial business model training process may be performed in any suitable manner, and will not be described herein.

Then, at step 303, a plurality of sub-models of the initial business model are determined based on the pruning of the initial business model. It can be understood that, in order to obtain a sub-model that can replace the initial business model from the initial business model, the initial business model can be pruned according to business requirements, so as to obtain a plurality of sub-models of the initial model. These sub-models may also be referred to as candidate models.

It should be noted that the pruning of the initial business model may be multiple times based on the initial business model, or may be superimposed on the pruned sub-models, as described above for the example portion shown in fig. 2, and is not described herein again.

The model is pruned in a mode of one of the following modes according to the sequence of model parameters from small to large: pruning away a predetermined proportion (e.g., 20%) of the model parameters, pruning away a predetermined number (e.g., 1000) of the model parameters, pruning to obtain models that are not larger than a predetermined size (e.g., 1000 megabytes), and so forth.

It will be appreciated that there is typically at least a portion of the model parameters, such as weight parameters, that may represent to some extent the importance of the model elements (e.g., neurons, tree nodes, etc.). When the business model is pruned, in order to reduce the number of parameters, the model units may be pruned, or the connection relationship between the model units may be pruned. Referring to fig. 4, the following description will be made by taking a business model as a neural network and model units as neurons as an example.

An embodiment may enable pruning of the model by reducing a predetermined number or proportion of model elements. For example, 100 or 10% of the neurons are trimmed off at each hidden layer of the neural network. Referring to fig. 4, since the importance of the neurons needs to be described by the weights corresponding to the expression relationships (connecting lines in fig. 4) between the neurons in different hidden layers, the values of the weight parameters can be used to decide which neurons to delete. Fig. 4 shows a schematic of a portion of hidden layers in a neural network. In fig. 4, in the ith hidden layer, assuming that the weight parameters corresponding to the connecting lines connecting the neurons in the previous layer or the neurons in the next hidden layer corresponding to the neurons represented by the dashed lines are all small, the importance of the neuron is small and can be trimmed.

Another embodiment may achieve pruning of the model by reducing a predetermined number or proportion of connected edges. Still referring to fig. 4, for each connecting edge in the neural network (e.g., the connecting edge between the neuron X1 and the neuron represented by the dotted line of the i-th hidden layer), if the corresponding weight parameter is smaller, it indicates that the importance degree of the previous neuron to the next neuron is lower, and the corresponding connecting edge may be deleted. Such a network structure is no longer an original fully-connected structure, but each neuron of a previous hidden layer only acts on a neuron of a next hidden layer which is relatively important, and each neuron of the next hidden layer only concerns the neuron of the previous hidden layer which is relatively important. Thus, the size of the business model also becomes smaller.

In other embodiments, the model may be pruned by reducing the number of connecting edges and model units at the same time, which is not described herein again. The model pruning unit and the pruning connection relation are all specific means for model pruning, and the specification does not limit the specific means. By such a pruning means, it is possible to prune a predetermined proportion of model parameters, prune a predetermined number of model parameters, prune a model having a size not exceeding a predetermined size, and the like.

Wherein, how large a part of the business model is specifically pruned can be determined according to a predetermined pruning rule or the scale requirement of the sub-model. The pruning rules may be, for example: the size of the sub-model is a predetermined number of bytes (e.g., 1000 megabytes), the size of the sub-model is a predetermined proportion (e.g., 70%) of the initial traffic model, the size of the sub-model after pruning is a predetermined proportion (e.g., 90%) of the model before pruning, connecting edges with weights less than a predetermined weight threshold are pruned, and so on. In short, the clipped model may leave out the model cells or connecting edges with low importance, and leave the model cells and connecting edges with high importance.

In the process of acquiring the sub-models, on one hand, parameters of the initial business model with a part cut off need to be further adjusted, and therefore, the cut model needs to be further trained. On the other hand, it is necessary to verify whether the cut-out portion of the initial business model is unnecessary from the beginning, so that the model parameters in the model after the cutting can be reset to the initialization state, and the training can be performed by using a plurality of training samples. And marking the trained model as a sub-model of the initial business model.

It can be understood that, since the initial business model is stopped when being trained to converge, when some of the business models are pruned, important model units may be deleted by mistake, which causes problems such as model performance degradation. Therefore, when training the pruned model, the resulting sub-model performance is uncertain. For example, if a part of the traffic model is clipped, and an important model unit is erroneously deleted, the model parameters (or loss function) may not converge, the convergence rate may be reduced, or the model accuracy may be reduced. Therefore, the corresponding performance index, such as accuracy, model size, convergence, etc., of each sub-model after training can also be recorded.

In this step 303, it is assumed that N submodels are available. Where N is a positive integer, which may be a preset number of iterations (a preset number), a preset number of submodels (a preset number), or a number that is reached according to a set pruning condition. For example, in the case of superimposing pruning on the pruned sub-model, the smaller the size of the sub-model obtained later, the pruning condition may be that the size of the sub-model obtained last is smaller than a predetermined size threshold (e.g., 100 megabytes). In this case, when the size of the submodel is smaller than the predetermined size, the trimming may be ended, and the obtained number N of submodels may be the number of actually obtained submodels.

Next, in step 304, a target business model is selected from each submodel using the first method of differential privacy based on the model indices corresponding to each submodel.

Differential privacy (differential privacy) is a means in cryptography that aims to provide a way to maximize the accuracy of data queries while minimizing the chances of identifying their records when querying from a statistical database. A random algorithm M is provided, and PM is a set formed by all possible outputs of M. For any two adjacent data sets D and D' and any subset SM of PMs, if the random algorithm M satisfies: pr [ M (D) epsilon SM]<＝e^ε×Pr[M(D＇)∈SM]Algorithm M is then said to provide epsilon-differential privacy protection, where the parameter epsilon is referred to as the privacy protection budget, which balances the degree and accuracy of privacy protection. ε may be generally predetermined. The closer ε is to 0, e^εThe closer to 1, the closer the processing results of the random algorithm to the two neighboring data sets D and D', the stronger the degree of privacy protection.

In this step 304, it amounts to a balance between the compression ratio and the model index. Classical implementations of differential privacy are e.g. laplacian mechanisms, exponential mechanisms, etc. Generally. The laplace mechanism can be used to add noise perturbations to the values, and for the case where the value perturbations are not significant, the exponential mechanism is more suitable. Here, selecting one sub-model from the plurality of sub-models as the target business model may be considered as a case where there is no meaning in numerical disturbance, and may preferably be performed by using an exponential mechanism, since the sub-model is selected, and the sub-model is not processed for the internal structure of the sub-model.

As a specific example, a process of how to select a target business model from sub-models by using the first way of differential privacy is described in detail below, where the first way of differential privacy is an exponential mechanism.

The N sub-models determined in step 303 may be viewed as N entitiesObjects, each entity object corresponding to a value r_iWherein i can range, for example, from 1 to N, each value r_iConstituting the output value range R of the query function. It is here intended to select one R from the range R_iAnd taking the entity object corresponding to the sub-model as a target business model. Assuming that a given data set (here understood to be a training sample set) is represented by D, under an exponential mechanism, the function q (D, r)_i) Referred to as the output value r_iIs determined.

In combination with each submodel, the usability is closely related to the model index. For example, in the case that the model index includes the compression ratio compared to the initial business model, the accuracy on the test sample set indicates that the sub-model is more desirable since the larger the compression ratio, the smaller the scale of the sub-model, and thus, in one specific example, the availability function may be related to the compression ratio s of the corresponding sub-model i_iAccuracy z_iAnd (4) positively correlating. Here, the function value of the availability function corresponding to each sub-model may be regarded as the availability coefficient of the corresponding sub-model, for example:

q(D，r_i)＝s_i×z_i

in other specific examples, the model indicators may include recall, F1 score, and the like, and the usability function may have other reasonable expressions according to the actual model indicators, which are not described herein again.

In the exponential mechanism epsilon-differential privacy, given a privacy cost epsilon (a preset value, e.g., 0.1), given a data set D and a usability function q (D, r), the privacy protection mechanism a (D, q) satisfies epsilon-differential privacy if and only if the following expression holds:

wherein, oc represents proportional to. Delta_qMay be a sensitivity factor representing the maximum change value of the availability function resulting from a change of a single datum (a single training sample in the example above). Here, due to accuracy and compression rateAll take values between 0 and 1, so that when a single datum changes, the maximum of q changes to 1, that is to say Δ_q1 is taken. In other embodiments, q is expressed differently, Δ_qMay be determined according to other ways and is not limited herein.

In a specific example, the privacy protection mechanism a may be a mechanism that performs sampling according to a sampling probability, and the sampling probability corresponding to the sub-model i may be denoted as a (D, q)_i). For example, the sampling probability of the ith sub-model may be:

where j represents any one of the submodels. In this way, an exponential mechanism of differential privacy is introduced into the sampling probability for sampling each sub-model, and sampling can be performed in the value range R (i.e., sampling in each sub-model) according to the sampled sampling probability corresponding to each sub-model.

In sampling, according to a specific example, a number between 0 and 1 may be divided into subintervals that correspond to the number of values (number of submodels) in the range R, and the length of each subinterval corresponds to the sampling probability. When a random number between 0 and 1 is generated by using a preselected random algorithm, a certain numerical value (corresponding to a sub-model) in a value range R corresponding to the interval where the random number is located is used as a sampled target value. The sub-model corresponding to the target value can be used as a target business model. According to another specific example, the value range R is a continuous value interval, and can be divided into sub-intervals with lengths positively correlated to the sampling probabilities of the corresponding submodels according to the sampling probabilities, and values are taken at random directly on the R, and the submodel corresponding to the interval in which the values fall can be used as the target business model.

It can be appreciated that, here, by performing sampling on the sub-models according to the sampling probability through an exponential mechanism in differential privacy, randomness is added to the selection of the target traffic model from the sub-models. Therefore, the specific structure of the sub-model is difficult to be estimated according to the initial business model, so that the target business model is difficult to estimate and realize privacy protection of the target business model and the business data.

It can be understood that, in the process of determining the target business model, each sub-model is subjected to preliminary training to select a suitable sub-model from the training, and the selected suitable sub-model is used as a final sub-model, so that a large amount of calculation caused by deleting a large amount of model parameters after completely training a huge initial business model is avoided. Therefore, the selected target business model can be further trained to be better used for business prediction for given business data, and business prediction results (such as scoring results, classification results, and the like) are obtained.

One training process for the target business model is for example: and inputting each training sample into the selected target business model, and adjusting model parameters according to the comparison between the output result and the sample label.

In general, the comparison of the output result and the sample label may measure the loss by means such as a difference, an absolute value of the difference, or the like in the case where the output result is a numerical value, or by means such as a variance, a euclidean distance, or the like in the case where the output result is a vector or a plurality of numerical values. After the loss is found, the model parameters may be adjusted with the goal of minimizing the loss. Some optimization algorithms can be adopted in the process to accelerate the convergence speed of the model parameters (or the loss functions). For example, an optimization algorithm such as a gradient descent method is used.

According to one possible design, in order to further protect data privacy, a differential privacy method can be introduced by adding interference noise in the loss gradient, and model parameters are adjusted to train a target business model based on privacy protection. At this time, the flow shown in fig. 3 may further include the following steps:

step 305, training the target business model based on the second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for business prediction aiming at given business data. There are many implementation manners of differential privacy, and the differential privacy is introduced here to add noise to data, for example, the differential privacy can be implemented by gaussian noise, laplacian noise, and the like, and is not limited herein.

In one embodiment, for a first sample of an input target business model, model parameters may be adjusted by: firstly, determining the original gradient of loss corresponding to a first batch of samples; then adding noise for realizing differential privacy to the original gradient to obtain a gradient containing the noise; then, the gradient containing noise is used to adjust the model parameters of the target business model. It is understood that the first sample may be a training sample or a plurality of training samples. In the case that the first samples include a plurality of training samples, the loss corresponding to the first samples may be the loss corresponding to the plurality of training samples, the average loss, or the like.

As an example, assume that for the first batch of samples described above, the resulting first raw gradient is:

where t represents the current t-th iteration training, x_iDenotes the ith sample, g, in the first batch of samples_t(x_i) Represents the gradient of loss of the ith sample in the t round_tRepresents the model parameter at the start of the t-th round of training, L (θ)_t,x_i) The loss function corresponding to the ith sample is shown.

As described above, the addition of the noise for implementing the differential privacy to the original gradient may be implemented by means such as laplacian noise, gaussian noise, or the like.

In an embodiment, taking the second way of differential privacy as gaussian noise as an example, the original gradient may be subjected to gradient clipping based on a preset clipping threshold to obtain a clipping gradient, and then, based on the clipping threshold and a predetermined noise scaling coefficient (a preset super parameter), gaussian noise for implementing differential privacy is determined, and then, the clipping gradient and the gaussian noise are fused (e.g., summed) to obtain a gradient including noise. It can be understood that, in the second mode, the original gradient is clipped on one hand, and the clipped gradients are superimposed on the other hand, so that the loss gradient is subjected to the difference privacy processing satisfying gaussian noise.

For example, the original gradient is gradient clipped to:

wherein the content of the first and second substances,

represents the clipped gradient for the ith sample in round t, C represents the clipping threshold, | g (x)_i)‖₂Denotes g_t(x_i) The second order norm of (d). That is, in the case where the gradient is less than or equal to the clipping threshold C, the original gradient is retained, and in the case where the gradient is greater than the clipping threshold C, the original gradient is clipped to a corresponding size in a proportion greater than the clipping threshold C.

Adding gaussian noise to the clipped gradient to obtain a gradient containing noise, for example:

wherein N represents the number of samples contained in the first batch of samples,

representing the gradients containing noise corresponding to the N samples in the t round;

representing the probability density coincidence with 0 as mean, σ²C²I is Gaussian noise of a Gaussian distribution of the variance; sigma represents the noise scaling coefficient, is a preset hyper parameter and can be set as required; c is the clipping threshold; i denotes an indicator function and may take 0 or 1, for example, it may be set that an even round in a plurality of rounds of training takes 1 and an odd round takes 0. In the above equation, when the first samples include a plurality of training samples, the gradient including noise is the pairGaussian noise is superimposed on the average clipped gradient after the original gradient clipping of a plurality of training samples. When the first batch of samples only contains one training sample, the gradient containing noise in the above formula is the original gradient of the training sample which is clipped and then added with Gaussian noise.

Then, using the gradient after gaussian noise addition, still with the goal of minimizing the loss corresponding to the sample i, the model parameters can be adjusted as follows:

wherein eta is_tThe learning step length or learning rate of the t-th round is a preset hyper-parameter, such as 0.5, 0.3, etc.; theta_t+1The adjusted model parameters obtained from the training of the t-th round (containing the first batch of samples) are shown. And under the condition that the difference privacy is met by gradient-added Gaussian noise, the adjustment of the model parameters meets the difference privacy.

Accordingly, after multiple rounds of iterative training, a target business model based on differential privacy can be obtained. Because Gaussian noise is added in the model training process, the model structure is difficult to be inferred from the data presented by the target business model or business data is difficult to be inferred, and therefore the effectiveness of privacy data protection can be further improved.

The trained target business model can be used for carrying out corresponding business prediction aiming at given business data. The business data is the business data consistent with the type of the training sample, such as financial related data of the user, and the user loan risk prediction can be carried out through a target business model

Reviewing the above process, the method for determining a target business model based on privacy protection provided in the embodiments of the present specification performs initial training on a selected complex business model to obtain an initial business model, then prunes the initial business model, and trains the pruned business model under the condition that parameters are reset back to an initialization state to check whether the pruned model parameters are unimportant from beginning. And selecting a target business model from the obtained multiple sub-models in a differential privacy mode. Therefore, a compression model with privacy protection can be obtained, and privacy protection is provided for the model on the basis of realizing model compression.

According to an embodiment of another aspect, an apparatus for determining a target business model based on privacy protection is also provided. The business model here may be a model for conducting business processing such as classification, scoring, etc. for given business data. The service data may be various types of data such as text, image, voice, video, animation, etc. The apparatus may be provided in a system, device, apparatus, platform, or server having some computing power.

FIG. 5 illustrates a schematic block diagram of an apparatus for determining a target business model based on privacy protection, according to one embodiment. As shown in fig. 5, the apparatus 500 includes:

an initialization unit 51 configured to determine initial values corresponding to respective model parameters for the selected service model according to a predetermined manner, thereby initializing the selected service model;

an initial training unit 52 configured to train the initialized selected service model using a plurality of training samples until the model parameters converge, so as to obtain an initial service model;

a pruning unit 53 configured to determine a plurality of sub models of the initial business model based on pruning of the initial business model, wherein each sub model corresponds to the model parameters and the model indexes determined by the re-training of the initialization unit 51 and the initial training unit 52: the initializing unit 51 resets the model parameters of the trimmed service model to the initial values of the corresponding model parameters in the initialized service model; the initial training unit 52 sequentially inputs a plurality of training samples into the trimmed service model, and adjusts the model parameters based on the comparison between the corresponding sample labels and the output results of the trimmed service model;

the determining unit 54 is configured to select a target business model from each sub-model by using a first mode of differential privacy based on the model index corresponding to each sub-model.

According to an embodiment, the trimming unit 53 may further be configured to:

pruning the initial business model according to the model parameters of the initial business model to obtain a first pruning model;

taking the first pruning model corresponding to the model parameters obtained through retraining as a first sub-model;

and iteratively trimming the first submodel to obtain a subsequent submodel until the end condition is met.

In one embodiment, the ending condition may include at least one of the number of iterations reaching a predetermined number, the number of submodels reaching a predetermined number, the size of the last submodel being less than a set size threshold, and the like.

In an alternative implementation, the model is pruned by the pruning unit 53 in an order from small to large according to one of the following modes: pruning away a predetermined proportion of model parameters, pruning away a predetermined number of model parameters, pruning to obtain models that do not exceed a predetermined size, and so forth.

According to one possible design, the first way of differential privacy is an exponential mechanism, and the determining unit 54 may be further configured to:

determining each availability coefficient corresponding to each submodel according to the model index corresponding to each submodel;

determining each sampling probability corresponding to each sub-model by using an index mechanism according to each availability coefficient;

and sampling in a plurality of sub models according to each sampling probability, and taking the sampled sub models as target business models.

In one embodiment, the apparatus 500 may further comprise a privacy training unit 55 configured to:

and training the target business model based on a second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for carrying out business prediction for protecting data privacy aiming at given business data.

In a further embodiment, the plurality of training samples includes a first batch of samples, where a sample i in the first batch of samples corresponds to a loss obtained after the processing of the target business model, and the privacy training unit 55 is further configured to:

determining an original gradient of the loss corresponding to the sample i;

adding noise on the original gradient by using a second mode of differential privacy to obtain a gradient containing the noise;

and adjusting the model parameters of the target business model by using the gradient containing the noise and taking the loss corresponding to the minimized sample i as a target.

In a further embodiment, the second way of differentiating privacy is to add gaussian noise, and the privacy training unit 55 may be further configured to:

based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient;

determining Gaussian noise for realizing differential privacy by utilizing Gaussian distribution determined based on a clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;

and superposing the Gaussian noise and the clipping gradient to obtain a gradient containing the noise.

It should be noted that the apparatus 500 shown in fig. 5 is an apparatus embodiment corresponding to the method embodiment shown in fig. 3, and the corresponding description in the method embodiment shown in fig. 3 is also applicable to the apparatus 500, and is not repeated herein.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 3.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims

1. A method for determining a target business model based on privacy protection, wherein the target business model is used for processing given business data to obtain a corresponding business prediction result; the method comprises the following steps:

training the selected business model by using a plurality of training samples until the model parameters are converged to obtain an initial business model;

determining a plurality of sub-models of the initial business model based on pruning of the initial business model, wherein each sub-model is respectively corresponding to a model parameter and a model index determined through retraining, and the model index is used for evaluating the model performance under the condition of convergence of the corresponding sub-models;

determining each sampling probability corresponding to each submodel based on each model index corresponding to each submodel by using an index mechanism of differential privacy;

and sampling each sub-model according to each sampling probability so as to select a target service model.

2. The method of claim 1, wherein the determining a plurality of sub-models of the initial business model based on the pruning of the initial business model comprises:

and iteratively trimming the first submodel to obtain a subsequent submodel until an end condition is met.

3. The method of claim 2, wherein the end condition comprises at least one of the number of iterations reaching a predetermined number, the number of submodels reaching a predetermined number, and the size of the last submodel being less than a set size threshold.

4. The method according to claim 1 or 2, wherein the pruning of the initial traffic model is performed in order of small to large model parameters based on one of: and trimming off model parameters in a preset proportion, trimming off a preset number of model parameters, and trimming to obtain a model with the scale not exceeding a preset size.

5. The method of claim 1, wherein the determining, by using an exponential mechanism of differential privacy, respective sampling probabilities respectively corresponding to the respective submodels based on the respective model indicators corresponding to the respective submodels comprises:

and determining each sampling probability corresponding to each sub-model by using an index mechanism according to each availability coefficient.

6. The method of claim 5, wherein the model metrics include a compression ratio relative to the initial traffic model and at least one of: accuracy, recall, F1 score, the usability factor being the product of the compression ratio and other terms contained in the model index.

7. The method of claim 1, wherein the method further comprises:

8. The method of claim 7, wherein the plurality of training samples comprise a first plurality of samples, wherein a sample i in the first plurality of samples corresponds to a loss obtained after the target business model is processed, and wherein training the target business model based on the second way of differential privacy using the plurality of training samples comprises:

determining an original gradient of a loss corresponding to the sample i;

adding noise on the original gradient by utilizing the second mode of differential privacy to obtain a gradient containing the noise;

and adjusting the model parameters of the target business model by using the gradient containing the noise and taking the minimum loss corresponding to the sample i as a target.

9. The method of claim 8, wherein the second way of differential privacy is to add gaussian noise, and the adding noise to the original gradient by the second way of differential privacy to obtain a gradient containing noise comprises:

determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;

and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.

10. The method of claim 1, wherein the traffic data comprises at least one of pictures, audio, characters.

11. A device for determining a target business model based on privacy protection is disclosed, wherein the target business model is used for processing given business data to obtain a corresponding business prediction result; the device comprises:

the pruning unit is configured to determine a plurality of sub models of the initial business model based on pruning of the initial business model, wherein each sub model respectively corresponds to a model parameter and a model index determined by retraining through the initial training unit, and the model index is used for describing the model performance of the corresponding sub model;

the determining unit is configured to determine each sampling probability corresponding to each submodel based on the model index corresponding to each submodel by using an index mechanism of differential privacy; and

12. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-10.

13. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.