WO2021204272A1 - 基于隐私保护确定目标业务模型 - Google Patents
基于隐私保护确定目标业务模型 Download PDFInfo
- Publication number
- WO2021204272A1 WO2021204272A1 PCT/CN2021/086275 CN2021086275W WO2021204272A1 WO 2021204272 A1 WO2021204272 A1 WO 2021204272A1 CN 2021086275 W CN2021086275 W CN 2021086275W WO 2021204272 A1 WO2021204272 A1 WO 2021204272A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- sub
- business
- business model
- initial
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Definitions
- One or more embodiments of this specification relate to the field of computer technology, and in particular to a method and device for determining a target business model based on privacy protection through a computer.
- Deep Neural Networks are favored by those skilled in the art because they mimic the way of thinking of the human brain and have better effects than simple linear models.
- a deep neural network is a neural network with at least one hidden layer, which can model complex nonlinear systems and improve model capabilities.
- a deep neural network can include up to millions of parameters. Therefore, it is hoped to find a method of model compression to reduce the data volume and complexity of the model. For this reason, conventional techniques usually use training samples to adjust millions of parameters in a deep neural network, and then delete or "prune" unnecessary weights to reduce the network structure to a more manageable size. Reducing the size of the model helps minimize its memory, inference, and computing requirements. In some business scenarios, many weights in a neural network can sometimes be reduced by as much as 99%, resulting in a smaller and sparser network.
- One or more embodiments of this specification describe a method and device for determining a target business model based on privacy protection, so as to solve one or more problems mentioned in the background art.
- a method for determining a target business model based on privacy protection is provided.
- the target business model is used to process given business data to obtain corresponding business prediction results; the method includes:
- the predetermined business model determines the initial value corresponding to each model parameter, thereby initializing the selected business model; using a plurality of training samples to train the initialized selected business model until the model parameters converge to obtain the initial business model;
- Based on the pruning of the initial business model multiple sub-models of the initial business model are determined, wherein each sub-model corresponds to the model parameters and model indicators determined by retraining in the following manner: the model of the pruned business model The parameters are reset to the initial values of the corresponding model parameters in the initialized business model; multiple training samples are sequentially input to the pruned business model, and the model parameters are adjusted based on the comparison of the corresponding sample labels with the output results of the pruned business model ;
- Based on the model indicators corresponding to each sub-model the first method of differential privacy is used to select the target business model from each sub-
- the determining the multiple sub-models of the initial business model based on the pruning of the initial business model includes: pruning the initial business model according to the model parameters of the initial business model to obtain The first pruning model; the first pruning model corresponding to the model parameters obtained through retraining is used as the first sub-model; the first sub-model is iteratively pruned to obtain subsequent sub-models until the end condition is satisfied.
- the end condition includes at least one of the number of iterations reaching a predetermined number, the number of sub-models reaching a predetermined number, and the scale of the last sub-model is less than a set scale threshold.
- the pruning of the model is based on one of the following methods, in the order of model parameters from small to large: pruning the model parameters of a predetermined proportion, pruning a predetermined number of model parameters, and pruning the scale to obtain a size not exceeding a predetermined The size of the model.
- the first method of differential privacy is an exponential mechanism
- the first method of differential privacy to select a target business model from each sub-model based on the model indicators corresponding to each sub-model includes: Each sub-model corresponds to the model index to determine the respective availability coefficients of each sub-model; according to each availability coefficient, the index mechanism is used to determine the respective sampling probabilities of each sub-model; according to the sampling probabilities in the multiple sub-models Sampling, using the sampled sub-model as the target business model.
- the method further includes: using a plurality of training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used to protect the given business data Business forecasts for data privacy.
- the multiple training samples include a first batch of samples, and sample i in the first batch of samples corresponds to a loss obtained after processing by the target business model, and the multiple training sample pairs Training the target service model based on the second method of differential privacy includes: determining the original gradient of the loss corresponding to the sample i; using the second method of differential privacy to add noise to the original gradient to obtain noise-containing Gradient; using the noise-containing gradient to minimize the loss corresponding to the sample i as a goal to adjust the model parameters of the target business model.
- the second method of differential privacy is to add Gaussian noise
- the second method of using the differential privacy to add noise to the original gradient to obtain a gradient containing noise includes: A clipping threshold is used to clip the original gradient to obtain a clipping gradient; the Gaussian distribution determined based on the clipping threshold is used to determine the Gaussian noise used to achieve differential privacy, wherein the variance of the Gaussian distribution is equal to the clipping threshold. Square positive correlation; superimpose the Gaussian noise and the clipping gradient to obtain the noise-containing gradient.
- the service data includes at least one of pictures, audio, and characters.
- a device for determining a target business model based on privacy protection the target business model is used to process given business data to obtain corresponding business prediction results;
- the device includes: an initialization unit configured to The initial value corresponding to each model parameter is determined for the selected business model in a predetermined manner, thereby initializing the selected business model;
- the initial training unit is configured to use a plurality of training samples to train the initialized selected business
- the model-to-model parameters converge to obtain the initial business model;
- the pruning unit is configured to determine a plurality of sub-models of the initial business model based on the pruning of the initial business model, wherein each sub-model corresponds to the initialization unit
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
- a computing device including a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the method of the first aspect is implemented .
- the selected complex business model is first trained to obtain the initial business model, and then the initial business model is pruned, and the parameters of the pruned business model are reset to the initialization state. Under the circumstances of training, to test whether the trimmed model parameters are not important from the beginning.
- the target business model is selected through differential privacy. In this way, a compression model for privacy protection can be obtained, and on the basis of implementing model compression, privacy protection is provided for the model.
- Figure 1 shows a schematic diagram of the implementation architecture of the target business model based on privacy protection in the technical concept of this specification
- Figure 2 shows a specific example of the process of determining multiple sub-networks based on the pruning of the initial neural network
- Fig. 3 shows a flowchart of a method for determining a target business model based on privacy protection according to an embodiment
- Figure 4 shows a schematic diagram of a specific example of pruning a neural network
- Fig. 5 shows a schematic block diagram of an apparatus for determining a target service model based on privacy protection according to an embodiment.
- Fig. 1 shows a schematic diagram of an implementation architecture according to the technical concept of this specification.
- the business model can be a machine learning model used to perform various business processing such as classification and scoring on business data.
- the business model shown in Figure 1 is implemented through a neural network. In practice, it can also be implemented in other ways, such as decision trees, linear regression, and so on.
- the business data can be at least one of multiple methods such as characters, audio, images, and animations, and is determined according to specific business scenarios, which is not limited here.
- the business model can be a machine learning model that is used by the lending platform to assist in evaluating the risk of a user’s lending business.
- the targeted business data can be a single user’s historical lending behavior data, default data, user portraits, etc.
- the business prediction result is the user’s Risk score.
- the business model can also be a model (such as a convolutional neural network) used to classify the target on the picture, the targeted business data can be various pictures, and the business prediction result can be, for example, the first target (such as a car ), the second target (bicycle), other categories, etc.
- the process of determining the target business model based on privacy protection may be a process of determining a simplified sub-model whose model indicators meet the requirements from a complex initial business model.
- the initial neural network can be a more complex neural network, which can include more features, weight parameters, and other parameters (such as constant parameters, auxiliary matrices) Wait.
- the model parameters of the initial neural network can be initialized in a predetermined manner, such as random initialization, set to predetermined values, and so on.
- the initial neural network is first trained through multiple training samples until the model parameters (or loss function) of the initial neural network converge. After that, the initial neural network is pruned to obtain multiple sub-networks. In the process of pruning the neural network, it can be performed according to a predetermined parameter ratio (such as 20%), a predetermined parameter number (such as 1000), a predetermined scale (such as at least 20 megabytes), and so on.
- the sub-network obtained by pruning the initial neural network is usually carried out in such a way as to continue training, pruning again on its basis, and continuing to train. In other words, it is a step-by-step compression of the initial neural network.
- the pruned sub-network is parameterized (restored to the initialization state), and the pruning network after resetting the parameters is trained. The purpose of this is to check whether the pruned neural network structure is not needed from the beginning. The conclusion of whether this is unnecessary from the beginning can be reflected by the evaluation indicators of the model, such as accuracy, recall, and convergence.
- the pruning of the neural network may include a process of removing part of the neurons in the neural network and/or removing part of the connections of the neurons.
- which neurons are to be discarded may be based on the weight parameters corresponding to the neurons.
- the weight parameter describes the importance of the neuron. Taking a fully connected neural network as an example, the weights corresponding to each neuron mapped to the next layer can be averaged, or the maximum value can be taken to obtain the reference weight. Further discard (pruning) according to the reference weight of each neuron in ascending order.
- the sub-network pruning process of a specific example under the implementation framework of this specification is given.
- the model parameters are reset to the initialization state, and the training samples are used to retrain it to obtain the first sub-network.
- the network structure and evaluation indicators of the first sub-network can be recorded.
- the steps to enter the trimming model begin to loop.
- the trained model parameters in the first sub-network the first sub-network is pruned, and the model parameters of the pruned neural network are reset to the initial state, and the training samples are used to retrain it as the second Subnet.
- the end condition here may be, for example, that the number of iterations reaches a predetermined number (such as the preset number N), the number of sub-models reaches a predetermined number (such as the preset number N), and the scale of the last sub-model is less than the set scale threshold (such as 100 megabytes, etc.) at least one item among others.
- a predetermined number such as the preset number N
- the number of sub-models reaches a predetermined number (such as the preset number N)
- the scale of the last sub-model is less than the set scale threshold (such as 100 megabytes, etc.) at least one item among others.
- each sub-network may have a different scale, for example, the first sub-network is 80% of the initial neural network, the second sub-network is 60% of the initial neural network, and so on.
- each pruned sub-network can be regarded as a sub-network set of the initial neural network, and based on the principle of differential privacy, a sub-network is randomly selected as the target neural network.
- the target business model is determined based on privacy protection, which can better protect the privacy of the business model and/or business data, and improve the practicability of the target neural network.
- the implementation architecture shown in Figure 1 takes the business model as an example of a neural network.
- the neurons described above can also be replaced with other model elements, for example, the business model is decision-making.
- the business model is decision-making.
- neurons can be replaced with tree nodes in the decision tree, and so on.
- the target neural network is used to make business predictions based on business data and obtain corresponding business prediction results. For example, according to the picture data, the business prediction result of the identified target category is obtained, and the business prediction result of the user's financial loan risk is obtained according to the user behavior data, and so on.
- Fig. 3 shows a process of determining a target business model based on privacy protection according to an embodiment.
- the business model here may be a model used for business processing such as classification and scoring for given business data.
- the business data here can be various types of data such as text, image, voice, video, and animation.
- the subject of execution of this process can be a system, equipment, device, platform or server with certain computing capabilities.
- the method for determining a target business model based on privacy protection may include the following steps: Step 301: Determine the respective initial values of each model parameter for the selected business model in a predetermined manner, thereby initializing the selected business model; Step 302: Use multiple training samples to train the initialized business model until the model parameters converge to obtain the initial business model; Step 303, based on the pruning of the initial business model, determine multiple sub-models of the initial business model, where each sub-model is individually Corresponding to the model parameters and model indicators determined by retraining in the following ways: reset the model parameters of the pruned business model to the initial values of the corresponding model parameters in the initialized business model; input multiple training samples in sequence Business model, and adjust the model parameters based on the comparison between the corresponding sample label and the output result of the pruned business model; step 304, based on the model indicators corresponding to each sub-model, use the first method of differential privacy to select from each sub-model Develop the target business model.
- step 301 initial values corresponding to each model parameter are determined for the selected business model in a predetermined manner, so as to initialize the selected business model.
- the model parameters need to be initialized first. That is, initial values are determined for each model parameter.
- the model parameters may be, for example, at least one of the weight of each neuron, a constant parameter, an auxiliary matrix, and the like.
- the model parameters are, for example, the weight parameters of each node, the connection relationship between the nodes, and the connection weight.
- the model parameters can also be other parameters, and we will not list them one by one here.
- the initial values of these model parameters can be determined in a predetermined manner, for example, a completely random value, a random value within a preset interval, a set value, and so on.
- the business model can give corresponding business prediction results, such as classification results, scoring results, and so on.
- step 302 a plurality of training samples are used to train the initialized business model until the model parameters converge to obtain the initial business model.
- the selected business model can run according to the corresponding logic and give the corresponding business prediction results, so that the initialized business model can be trained using the training samples.
- Each training sample may correspond to sample business data and corresponding sample labels.
- the training process of the initialized business model may be, for example, inputting each piece of sample business data into the initialized business model in turn, and adjusting the model parameters according to the comparison between the business prediction result output by the business model and the corresponding business label.
- the model parameter convergence can be described by the fluctuation value of each model parameter, or by the loss function. This is because the loss function is usually a function of the model parameters. When the loss function converges, it represents the convergence of the model parameters. For example, when the maximum change value of the loss function or the fluctuation of the model parameter is less than a predetermined threshold, it can be determined that the model parameter converges.
- the selected business model completes the current stage of training, and the obtained business model can be called the initial business model.
- the initial business model training process here can be performed in any suitable manner, and will not be repeated here.
- step 303 based on the pruning of the initial business model, multiple sub-models of the initial business model are determined. It can be understood that in order to obtain sub-models that can replace the initial business model from the initial business model, the initial business model can be pruned according to business requirements, so as to obtain multiple initial model sub-models. These sub-models can also be called candidate models.
- the pruning of the initial business model can be performed multiple times on the basis of the initial business model, or it can be pruned on the basis of the pruned sub-model, as shown in the previous section on the example part shown in Figure 2. The description will not be repeated here.
- the pruning of the model is based on one of the following methods, in the order of model parameters from small to large: pruning a predetermined proportion (such as 20%) of model parameters, trimming a predetermined number (such as 1000) of model parameters, and trimming Models whose scale does not exceed a predetermined size (eg 1000 megabytes), etc.
- model parameters which can reflect the importance of model units (such as neurons, tree nodes, etc.) to a certain extent, such as weight parameters.
- model units such as neurons, tree nodes, etc.
- weight parameters such as weight parameters.
- the model units can be pruned, and the connection relationship between the model units can also be pruned.
- the business model is a neural network and the model unit is a neuron as an example.
- An embodiment may implement pruning of the model by reducing a predetermined number or a predetermined proportion of model units. For example, 100 or 10% of neurons are pruned in each hidden layer of the neural network. As shown in Figure 4, since the importance of neurons needs to be described by the weights corresponding to the expression relationships between neurons in different hidden layers (connecting lines in Figure 4), the value of the weight parameter can be used to determine the deletion Which neurons.
- Figure 4 shows a schematic diagram of some hidden layers in a neural network.
- Another embodiment can implement pruning of the model by reducing a predetermined number or a predetermined proportion of connecting edges.
- connection edge in the neural network such as the connection edge between the neuron X1 and the dashed line of the i-th hidden layer
- the corresponding weight parameter is small, it indicates the previous The importance of one neuron corresponding to the next neuron is low, and the corresponding connecting edge can be deleted.
- Such a network structure is no longer the original fully connected structure, but each neuron in the previous hidden layer only works on the relatively important neurons in the latter hidden layer, and each neuron in the latter hidden layer only pays attention to what is important to it. Neurons in the previous hidden layer with higher sex. In this way, the scale of the business model will also become smaller.
- the pruning of the model can also be achieved by reducing the connecting edges and model units at the same time, which will not be repeated here. Pruning model units and pruning connection relations are specific means of model pruning, and this specification does not limit the specific means. Through this pruning method, it is possible to trim off a predetermined proportion of model parameters, trim off a predetermined number of model parameters, trim a model whose scale does not exceed a predetermined size, and so on.
- the pruning rule can be, for example, the size of the sub-model is a predetermined number of bytes (such as 1000 megabytes), the size of the sub-model is a predetermined proportion of the initial business model (such as 70%), the size of the sub-model after trimming and the size before trimming
- the model scale is a predetermined proportion (such as 90%), the connected edges whose weight is less than a predetermined weight threshold are trimmed, and so on.
- the trimmed model can abandon the model units or connecting edges with low importance, and retain the model units and connecting edges with high importance.
- the parameters of the initial business model after a part of the cut need to be further adjusted, therefore, further training of the cut model is required.
- the trained model is recorded as a sub-model of the initial business model.
- the initial business model stops when it is trained to converge, when trimming a part of it, important model units may be deleted by mistake, causing problems such as model performance degradation. Therefore, when training the pruned model, the performance of the sub-model obtained is uncertain. For example, if a part of the business model is trimmed, if important model units are mistakenly deleted, the model parameters (or loss function) may not converge, the convergence speed will decrease, or the model accuracy will decrease. Therefore, it is also possible to record the corresponding performance indicators of each sub-model after training, such as accuracy, model size, convergence and so on.
- N sub-models can be obtained.
- N is a positive integer, which can be a preset number of iterations (predetermined number), a preset number of sub-models (predetermined number), or a number reached according to a set trimming condition.
- predetermined number a preset number of iterations
- predetermined number a preset number of sub-models
- a number reached according to a set trimming condition For example, in the case of superimposing pruning on the basis of pruned sub-models, the size of the sub-model obtained later is smaller, and the pruning condition may be that the size of the finally obtained sub-model is smaller than a predetermined size threshold (for example, 100 megabytes).
- the pruning can be ended when the size of the sub-model is smaller than the predetermined size, and the number of sub-models obtained N is the number of sub-models actually obtained.
- step 304 based on the model indicators corresponding to each sub-model, the first method of differential privacy is used to select the target business model from each sub-model.
- Differential privacy is a means in cryptography, which aims to provide a way to maximize the accuracy of data query when querying from a statistical database, while minimizing the chance of identifying its records.
- M is a random algorithm
- PM is a set of all possible outputs of M.
- Pr[M(D) ⁇ SM] ⁇ e ⁇ ⁇ Pr[M(D') ⁇ SM]
- the algorithm M provides ⁇ -differential privacy protection, where the parameter ⁇ is called the privacy protection budget, which is used to balance the degree of privacy protection and accuracy.
- ⁇ can usually be set in advance. The closer ⁇ is to 0, the closer e ⁇ is to 1, and the closer the processing results of the random algorithm to the two adjacent data sets D and D'are, the stronger the degree of privacy protection.
- this step 304 it is equivalent to a balance between the compression ratio and the model index.
- Classical implementations of differential privacy include Laplace mechanism, exponential mechanism, etc. usually.
- the Laplacian mechanism can be used to add noise perturbation to the value, but for the case where the numerical perturbation is meaningless, the exponential mechanism is more suitable.
- a sub-model is selected from multiple sub-models as the target business model. Since it is the selection of the sub-model, rather than the processing of the internal structure of the sub-model, it can be regarded as a situation that has no meaning for numerical disturbances and can be preferred Use exponential mechanism.
- the following describes in detail the process of how to use the first method of differential privacy to select the target business model from the sub-models when the first method of differential privacy is the exponential mechanism.
- the N sub-models determined in step 303 can be regarded as N entity objects, and each entity object corresponds to a value r i , where the value range of i can be, for example, 1 to N, and each value r i constitutes the output range of the query function R.
- the purpose here is to select a r i from the value range R, and use its corresponding entity object, that is, the sub-model as the target business model.
- D is used to represent a given data set (which can be understood as a training sample set here)
- the function q(D, r i ) is called the availability function of the output value r i.
- the availability function can be positively correlated with the compression ratio si and accuracy z i of the corresponding sub-model i.
- the function value of the availability function corresponding to each sub-model can be recorded as the availability coefficient of the corresponding sub-model, for example:
- model indicators may include recall rate, F1 score, etc.
- usability function may also have other reasonable expressions based on the actual model indicators, which will not be repeated here.
- ⁇ means proportional to.
- ⁇ q may be a sensitivity factor, which is used to represent the maximum change value of the availability function caused by a change in a single data (a single training sample in the above example).
- ⁇ q takes 1.
- the expression of q is different, and ⁇ q can be determined according to other methods, which is not limited here.
- the privacy protection mechanism A may be a mechanism for sampling according to the sampling probability, and the sampling probability corresponding to the sub-model i may be denoted as A(D, q i ).
- the sampling probability of the i-th submodel can be:
- j represents any sub-model.
- an exponential mechanism of differential privacy is introduced into the sampling probability of each sub-model.
- sampling can be performed in the range R (ie, sampling in each sub-model).
- the number between 0-1 can be divided into sub-intervals consistent with the number of values (the number of sub-models) in the range R, and the length of each sub-interval corresponds to the aforementioned sampling probability.
- a preselected random algorithm is used to generate a random number between 0-1, a certain value (corresponding to a sub-model) in the range R corresponding to the interval of the random number is used as the sampled target value.
- the sub-model corresponding to the target value can be used as the target business model.
- the value range R is a continuous numerical interval, which can be divided into sub-intervals whose length is positively related to the sampling probability of the corresponding sub-model according to the sampling probability.
- the sub-model corresponding to the interval can be used as the target business model.
- the exponential mechanism in differential privacy is used to complete the sampling of the sub-models according to the sampling probability, which adds randomness to the selection of the target business model from the sub-models.
- each sub-model undergoes preliminary training to select and point out the appropriate sub-model as the final sub-model to avoid a large number of deleted models after the huge initial business model is fully trained A large number of calculations caused by parameters. Therefore, the selected target business model can be further trained to be better used to make business predictions for the given business data, and obtain business prediction results (such as scoring results, classification results, etc.).
- a training process for the target business model is, for example, inputting each training sample to the selected target business model, and adjusting the model parameters according to the comparison between the output result and the sample label.
- the output result is compared with the sample label.
- the loss can be measured by methods such as the difference value and the absolute value of the difference value.
- the output result is a vector or multiple values, you can The loss is measured by methods such as variance and Euclidean distance.
- the model parameters can be adjusted with the goal of minimizing the loss.
- Some optimization algorithms can also be used in this process to speed up the convergence speed of the model parameters (or loss function). For example, optimization algorithms such as gradient descent are used.
- the method of differential privacy can be introduced by adding interference noise to the loss gradient, and the model parameters can be adjusted to train the target business model based on privacy protection.
- the process shown in FIG. 3 may further include the following steps:
- Step 305 Use a plurality of training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for business prediction for the given business data.
- differential privacy There are many ways to implement differential privacy.
- the purpose of introducing differential privacy here is to add noise to the data.
- it can be implemented by means of Gaussian noise, Laplacian noise, etc., which are not limited here.
- the model parameters can be adjusted through the following steps: first, determine the original gradient of the loss corresponding to the first batch of samples; then add to the original gradient for realizing the difference For privacy noise, the gradient containing the noise is obtained; then, the gradient containing the noise is used to adjust the model parameters of the target business model.
- the first batch of samples here can be one training sample or multiple training samples.
- the loss corresponding to the first batch of samples may be the sum of the losses corresponding to the multiple training samples, the average loss, and so on.
- the first original gradient obtained is:
- t represents the current iterative training of the t-th round
- x i represents the i-th sample in the first batch of samples
- g t (x i ) represents the loss gradient of the i-th sample in the t-th round
- ⁇ t represents The model parameters at the beginning of the t-th round of training
- L( ⁇ t , x i ) represents the loss function corresponding to the i-th sample.
- adding noise to the original gradient to achieve differential privacy can be achieved by means such as Laplacian noise, Gaussian noise, and the like.
- the original gradient may be subjected to gradient clipping based on a preset clipping threshold to obtain the clipping gradient, which is then based on the clipping threshold and a predetermined noise scaling factor ( Pre-set hyperparameters), determine the Gaussian noise used to achieve differential privacy, and then fuse the clipping gradient with the Gaussian noise (for example, sum) to obtain the gradient containing the noise.
- a predetermined noise scaling factor Pre-set hyperparameters
- the second method cuts the original gradient on the one hand, and superimposes the cut gradient on the other hand, so as to perform differential privacy processing that satisfies Gaussian noise on the loss gradient.
- the original gradient is gradient cropped as:
- C represents the clipping threshold
- ⁇ g(x i ) ⁇ 2 represents the second-order norm of g t (x i ). That is to say, when the gradient is less than or equal to the cropping threshold C, the original gradient is retained, and when the gradient is greater than the cropping threshold C, the original gradient is cropped to a corresponding size in a ratio greater than the cropping threshold C.
- N represents the number of samples contained in the first batch of samples, Indicates the gradient containing noise corresponding to the N samples in the t-th round; Indicates that the probability density conforms to Gaussian noise with a Gaussian distribution with 0 as the mean and ⁇ 2 C 2 I as the variance; ⁇ represents the above-mentioned noise scaling coefficient, which is a pre-set super parameter, which can be set as needed; C is the above-mentioned clipping threshold; I represents an indicator function, which can be 0 or 1. For example, it can be set to be 1 for even-numbered rounds in multiple rounds of training, and 0 for odd-numbered rounds.
- the gradient containing noise is the average clipping gradient after clipping the original gradients of the multiple training samples superimposed with Gaussian noise.
- the gradient containing noise in the above formula is the original gradient of the training sample after being clipped and superimposed with Gaussian noise.
- the model parameters can be adjusted as follows:
- ⁇ t represents the learning step size of the t-th round, or the learning rate, which is a pre-set hyperparameter, such as 0.5, 0.3, etc.; ⁇ t+1 represents the training through the t-th round (including the first batch of samples) The adjusted model parameters obtained.
- the adjustment of the model parameters meets the differential privacy.
- a target business model based on differential privacy can be obtained. Since Gaussian noise is added in the model training process, it is difficult to infer the model structure or reverse the business data from the data presented by the target business model. In this way, the effectiveness of privacy data protection can be further improved.
- the trained target business model can be used to make corresponding business predictions for the given business data.
- the business data here is the business data consistent with the type of training samples, such as the user's financial-related data, which can be used to predict the user's loan risk through the target business model
- the method of determining the target business model based on privacy protection provided by the embodiment of this specification first conducts initial training on the selected complex business model to obtain the initial business model, then trims the initial business model, and performs the trimming
- the business model is trained with the parameters reset back to the initialization state to test whether the trimmed model parameters are not important from the beginning.
- the target business model is selected through differential privacy. In this way, a compression model for privacy protection can be obtained, and on the basis of implementing model compression, privacy protection is provided for the model.
- an apparatus for determining a target business model based on privacy protection is also provided.
- the business model here may be a model used for business processing such as classification and scoring for given business data.
- the business data here can be various types of data such as text, image, voice, video, and animation.
- the device can be installed in a system, equipment, device, platform or server with certain computing capabilities.
- Fig. 5 shows a schematic block diagram of an apparatus for determining a target service model based on privacy protection according to an embodiment.
- the device 500 includes:
- the initialization unit 51 is configured to determine the respective initial values of each model parameter for the selected business model in a predetermined manner, so as to initialize the selected business model;
- the initial training unit 52 is configured to use a plurality of training samples to train the initialized selected business model until the model parameters converge to obtain the initial business model;
- the pruning unit 53 is configured to determine multiple sub-models of the initial business model based on the pruning of the initial business model, where each sub-model corresponds to model parameters and model indicators determined through retraining of the initialization unit 51 and the initial training unit 52:
- the initialization unit 51 resets the model parameters of the pruned business model to the initial values of the corresponding model parameters in the initialized business model;
- the initial training unit 52 sequentially inputs multiple training samples into the pruned business model, and based on the corresponding sample label Compare with the output result of the pruned business model, adjust the model parameters;
- the determining unit 54 is configured to select a target business model from each sub-model by using the first method of differential privacy based on the model index corresponding to each sub-model.
- the pruning unit 53 may be further configured to: prun the initial business model according to the model parameters of the initial business model to obtain the first pruning model; and to correspond to the first pruning model with the model parameters obtained after retraining, As the first sub-model; iteratively trim the first sub-model to obtain subsequent sub-models until the end condition is met.
- the foregoing end condition may include at least one of the number of iterations reaching a predetermined number, the number of sub-models reaching a predetermined number, and the scale of the last sub-model being smaller than a set scale threshold, and so on.
- the pruning unit 53 trims the model based on one of the following methods, in descending order of model parameters: trimming the model parameters of a predetermined proportion, trimming a predetermined number of model parameters, Pruning to obtain a model whose scale does not exceed a predetermined size, and so on.
- the first method of differential privacy is an exponential mechanism
- the determining unit 54 may be further configured to: determine each availability coefficient corresponding to each sub-model according to the model index corresponding to each sub-model; according to each availability coefficient, The exponential mechanism is used to determine the respective sampling probabilities of each sub-model; samples are sampled in multiple sub-models according to the respective sampling probabilities, and the sampled sub-model is used as the target business model.
- the device 500 may further include a privacy training unit 55, configured to use multiple training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for a given Business forecasts for protecting data privacy.
- a privacy training unit 55 configured to use multiple training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for a given Business forecasts for protecting data privacy.
- the multiple training samples include the first batch of samples, and the sample i in the first batch of samples corresponds to the loss obtained after the target business model is processed, and the privacy training unit 55 is further configured to: determine that the sample i corresponds to The original gradient of the loss; using the second method of differential privacy to add noise to the original gradient to obtain a gradient containing noise; using the gradient containing noise, to minimize the loss corresponding to sample i as the goal, adjust the model parameters of the target business model .
- the second method of differential privacy is to add Gaussian noise
- the privacy training unit 55 may also be configured to: crop the original gradient based on a preset cropping threshold to obtain the cropped gradient; use cropping threshold-based
- the determined Gaussian distribution determines the Gaussian noise used to achieve differential privacy, where the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; the Gaussian noise and the clipping gradient are superimposed to obtain a gradient containing noise.
- the apparatus 500 shown in FIG. 5 is an apparatus embodiment corresponding to the method embodiment shown in FIG. 3, and the corresponding description in the method embodiment shown in FIG. 3 is also applicable to the apparatus 500. Go into details again.
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 3.
- a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 3 method.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本说明书实施例提供一种基于隐私保护确定目标业务模型的方法和装置,先对选定的复杂业务模型进行初始训练,得到初始业务模型,然后对初始业务模型进行修剪,并对修剪后的业务模型在参数重置回初始化状态的情形下进行训练,以考验修剪掉的模型参数是否自始不重要。对于得到的多个子模型,通过差分隐私的方式,从中选择目标业务模型。这样,可以获取隐私保护的压缩模型,在实现模型压缩的基础上,为模型提供隐私保护。
Description
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及通过计算机基于隐私保护确定目标业务模型的方法和装置。
随着机器学习技术的发展,深度神经网络(Deep Neural Network,DNN)由于模仿人脑的思考方式,比简单的线性模型有更好的效果,而受到本领域技术人员的青睐。深度神经网络是一种具备至少一个隐层的神经网络,能够为复杂非线性系统建模,提高模型能力。
深度神经网络由于复杂的网络结构,特征和模型参数体系也非常大。例如,一个深度神经网络可以包括高达数百万个参数。因此,希望寻求模型压缩的方法,减少模型的数据量和复杂度。为此,常规技术中通常利用训练样本调整深度神经网络中的数百万个参数,然后删除或“修剪”不必要的权重,以将网络结构缩减到更易于管理的大小。减小模型尺寸有助于最大程度地减小其内存、推理和计算需求。在一些业务场景中,神经网络中的许多权重有时可以被削减多达99%,从而产生更小、更稀疏的网络。
然而,这种训练完成之后又删减的方式,需要较高的计算成本,进行了大量“无效”计算。于是设想在原始神经网络的子网络中寻找一个尽可能满足要求的网络进行训练。同时,常规技术中,较简单的神经网络更易于获取原始数据。为此,需要提供一种方法,即能够保护数据的隐私,又可以压缩模型的大小来实现实时计算和端上部署,从多方面提高模型的性能。
发明内容
本说明书一个或多个实施例描述了一种基于隐私保护确定目标业务模型的方法及装置,用以解决背景技术提到的一个或多个问题。
根据第一方面,提供了一种基于隐私保护确定目标业务模型的方法,所述目标业务模型用于处理给定的业务数据,得到相应的业务预测结果;所述方法包括:按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化所述选定的业务 模型;使用多个训练样本训练经过初始化的所述选定的业务模型至模型参数收敛,得到初始业务模型;基于对所述初始业务模型的修剪,确定所述初始业务模型的多个子模型,其中,各个子模型各自对应有通过以下方式重新训练确定的模型参数以及模型指标:将修剪后的业务模型的模型参数重置为初始化的业务模型中的相应模型参数的初始值;将多个训练样本依次输入修剪后的业务模型,并基于相应样本标签与修剪后的业务模型的输出结果的对比,调整模型参数;基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
在一个实施例中,所述基于对所述初始业务模型的修剪,确定所述初始业务模型的多个子模型包括:按照所述初始业务模型的模型参数,对所述初始业务模型进行修剪,得到第一修剪模型;将对应有经过重新训练得到的模型参数的第一修剪模型,作为第一子模型;迭代修剪所述第一子模型得到后续子模型,直至满足结束条件。
在一个实施例中,所述结束条件包括,迭代次数达到预定次数、子模型数量达到预定数量、最后一个子模型的规模小于设定的规模阈值中的至少一项。
在一个实施例中,对模型的修剪基于以下之一的方式,按照模型参数由小到大的顺序进行:修剪掉预定比例的模型参数、修剪掉预定数量的模型参数、修剪得到规模不超过预定大小的模型。
在一个实施例中,所述差分隐私的第一方式为指数机制,所述基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型包括:按照各个子模型各自对应的模型指标,确定各个子模型分别对应的各个可用性系数;根据各个可用性系数,利用指数机制确定各个子模型分别对应的各个采样概率;按照各个采样概率在所述多个子模型中采样,将被采样到的子模型作为目标业务模型。
在一个实施例中,所述方法还包括:利用多个训练样本对所述目标业务模型基于差分隐私的第二方式进行训练,使得训练后的目标业务模型用于针对给定的业务数据进行保护数据隐私的业务预测。
在一个实施例中,所述多个训练样本包括第一批样本,所述第一批样本中的样本i对应有经所述目标业务模型处理后得到的损失,所述利用多个训练样本对所述目标业务模型基于差分隐私的第二方式进行训练包括:确定所述样本i对应的损失的原始梯度;利用所述差分隐私的第二方式在所述原始梯度上添加噪声,得到包含噪声的梯度;利用所述包含噪声的梯度,以最小化所述样本i对应的损失为目标,调整所述目标业务模型 的模型参数。
在一个实施例中,所述差分隐私的第二方式为添加高斯噪声,所述利用所述差分隐私的第二方式在所述原始梯度上添加噪声,得到包含噪声的梯度包括:基于预设的裁剪阈值,对所述原始梯度进行裁剪,得到裁剪梯度;利用基于所述裁剪阈值确定的高斯分布,确定用于实现差分隐私的高斯噪声,其中,所述高斯分布的方差与所述裁剪阈值的平方正相关;将所述高斯噪声与所述裁剪梯度叠加,得到所述包含噪声的梯度。
在一个实施例中,所述业务数据包括图片、音频、字符中的至少一项。
根据第二方面,提供了一种基于隐私保护确定目标业务模型的装置,所述目标业务模型用于处理给定的业务数据,得到相应的业务预测结果;所述装置包括:初始化单元,配置为按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化所述选定的业务模型;初始训练单元,配置为使用多个训练样本训练经过初始化的所述选定的业务模型至模型参数收敛,得到初始业务模型;修剪单元,配置为基于对所述初始业务模型的修剪,确定所述初始业务模型的多个子模型,其中,各个子模型各自对应有通过所述初始化单元以下和所述初始训练单元重新训练确定的模型参数以及模型指标:所述初始化单元将修剪后的业务模型的模型参数重置为初始化的业务模型中的相应模型参数的初始值;所述初始训练单元将多个训练样本依次输入修剪后的业务模型,并基于相应样本标签与修剪后的业务模型的输出结果的对比,调整模型参数;确定单元,配置为基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。
根据第四方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。
通过本说明书实施例提供的方法和装置,先对选定的复杂业务模型进行初始训练,得到初始业务模型,然后对初始业务模型进行修剪,并对修剪后的业务模型在参数重置回初始化状态的情形下进行训练,以考验修剪掉的模型参数是否自始不重要。对于得到的多个子模型,通过差分隐私的方式,从中选择目标业务模型。这样,可以获取隐私保护的压缩模型,在实现模型压缩的基础上,为模型提供隐私保护。
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出本说明书技术构思中基于隐私保护确定目标业务模型的实施架构示意图;
图2示出一个具体例子中基于对初始神经网络的修剪确定多个子网络的流程;
图3示出根据一个实施例的基于隐私保护确定目标业务模型的方法流程图;
图4示出一个具体例子的对神经网络修剪的示意图;
图5示出根据一个实施例的基于隐私保护确定目标业务模型的装置的示意性框图。
下面结合附图,对本说明书提供的方案进行描述。
图1示出了根据本说明书技术构思的一个实施架构示意图。本说明书的技术构思下,业务模型可以是用于对业务数据进行诸如分类、打分等各种业务处理的机器学习模型。图1示出的业务模型通过神经网络实现,实践中,还可以通过其他方式实现,例如决策树、线性回归等等。业务数据可以是字符、音频、图像、动画等多种方式中的额至少一种,根据具体的业务场景确定,在此不作限定。
例如,业务模型可以是用于借贷平台辅助评估用户借贷业务风险度的机器学习模型,针对的业务数据可以是单个用户的历史借贷行为数据、违约数据、用户画像等等,业务预测结果为用户的风险分数。再例如,业务模型也可以是用于对图片上的目标进行分类的模型(如卷积神经网络),针对的业务数据可以是各种图片,业务预测结果例如可以是第一目标(如小汽车)、第二目标(自行车)、其他类别等。
特别地,本说明书实施架构尤其适用于业务模型是较复杂的非线性模型的情况。基于隐私保护确定目标业务模型的过程可以是从复杂的初始业务模型中确定出模型指标符合要求的精简子模型的过程。
以业务模型为神经网络为例,如图1所示,初始神经网络可以是较复杂的神经网络,该神经网络中可以包括较多的特征、权重参数、其他参数(如常数参数、辅助矩阵)等。 初始神经网络的模型参数可以通过预定方式初始化,例如随机初始化、设定为预定值等。在该实施架构下,首先经过多个训练样本对初始神经网络进行训练,至初始神经网络的模型参数(或者损失函数)收敛。之后,对初始神经网络进行修剪,得到多个子网络。在对神经网络修剪过程中,可以按照预定参数比例(如20%)、预定参数数量(如1000个)、预定规模(如至少20兆字节)等等方式进行。
常规技术中,对初始神经网络的修剪得到的子网络通常采用继续训练、在其基础上再次修剪、继续训练这样的方式进行。也就是说,是对初始神经网络一步步压缩的过程。而在本说明书实施例的构思下,在对初始神经网络进行修剪之后,将修剪得到的子网络进行参数重置(恢复初始化状态),并对重置参数后的修剪网络进行训练。这样做的目的是可以检验被修剪掉的神经网络结构是否自始不需要。这种是否自始不需要的结论可以通过模型的评价指标,例如准确度、召回率、收敛性等进行体现。
值得说明的是,神经网络的修剪可以包括去除神经网络中的一部分神经元以及/或者去除神经元中的一部分连接关系的过程。在可选的实现方式中,舍弃哪些神经元,可以以神经元对应的权重参数作为参考。权重参数描述出神经元的重要度,以全连接神经网络为例,可以将一个神经元到映射到下一层的各个神经元分别对应的各个权重求平均,或者取最大值,得到参考权重。进一步按照各个神经元的参考权重由小到大的顺序进行舍弃(修剪)。
如图2所示,给出了本说明书实施架构下一个具体例子的子网络修剪流程。在图2中,对于修剪之后的剩余部分的神经网络,将模型参数重置到初始化状态,利用训练样本对其重新训练,得到第一子网络。同时,可以将第一子网络的网络结构、评价指标等记录下来。然后,如左侧箭头所示,进入修剪模型的步骤开始循环。按照训练好的第一子网络中的模型参数,对第一子网络进行修剪,并针对修剪后的神经网络,将其模型参数重置到初始化状态,利用训练样本对其重新训练,作为第二子网络。继续沿左侧箭头循环,以此类推,直至得到满足结束条件的第N子网络。其中,这里的结束条件例如可以是,迭代次数达到预定次数(如预设次数N)、子模型数量达到预定数量(如预设数量N)、最后一个子模型的规模小于设定的规模阈值(如100兆字节等)等等中的至少一项。
如此,可以得到初始神经网络的多个子网络。在一些可选的实施方式中,图2左侧的箭头可以回到最上端,也就是得到第一子网络后,重新初始化最初的神经网络,训练该重新初始化的神经网络,并进行修剪,对修剪后的子网络进行训练,作为第二子网络, 以此类推,直至得到第N子网络。其中,各个子网络可以具有不同的规模,例如第一子网络为初始神经网络的80%,第二子网络为初始神经网络的60%,等等。在这种方式下,每次初始化神经网络时,还可以进行一些随机化,也就是说每次在特征或者神经元上进行随机采样,舍弃一小部分(如1%)特征及初始化参数,以对初始的神经网络造成小的扰动,在保证每次的初始化神经网络都和最初的神经网络一致的情况下,具有小差别,以考验不同的神经元作用。
继续参考图1所示。针对各个子网络,可以从中选择出一个子网络作为目标神经网络。根据一个实施例,为了保护数据隐私,可以将修剪得到的各个子网络看作初始神经网络的子网络集,基于差分隐私原理,随机选择出一个子网络作为目标神经网络。这样通过差分隐私的方式,基于隐私保护确定目标业务模型,可以更好地保护业务模型和/或业务数据隐私,提高目标神经网络的实用性。
可以理解的是,图1示出的实施架构以业务模型是神经网络为例,当业务模型是其他机器学习模型时,以上描述中的神经元也可以换做其他模型元素,例如业务模型是决策树时,神经元可以换成决策树中的树节点,等等。
目标神经网络用于针对业务数据进行业务预测,得到相应的业务预测结果。例如,根据图片数据,得到识别到的目标类别的业务预测结果,根据用户行为数据,得到用户的金融借贷风险性的业务预测结果,等等。
下面详细描述基于隐私保护确定目标业务模型的具体流程。
图3示出一个实施例的基于隐私保护确定目标业务模型的流程。其中,这里的业务模型可以是用于针对给定的业务数据,进行诸如分类、打分等业务处理的模型。这里的业务数据可以是文字、图像、语音、视频、动画等各种类型的数据。该流程的执行主体可以是具有一定计算能力的系统、设备、装置、平台或服务器。
如图3所示,基于隐私保护确定目标业务模型的方法可以包括以下步骤:步骤301,按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化选定的业务模型;步骤302,使用多个训练样本训练经过初始化的业务模型至模型参数收敛,得到初始业务模型;步骤303,基于对初始业务模型的修剪,确定初始业务模型的多个子模型,其中,各个子模型各自对应有通过以下方式重新训练确定的模型参数以及模型指标:将修剪后的业务模型的模型参数重置为初始化的业务模型中的相应模型参数的初始值;将多个训练样本依次输入修剪后的业务模型,并基于相应样本标签与修剪后的业 务模型的输出结果的对比,调整模型参数;步骤304,基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
首先,在步骤301,按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化选定的业务模型。
可以理解,对于选定的业务模型,为了能够训练模型,首先需要对模型参数进行初始化。也就是为各个模型参数确定初始值。在选定的业务模型是神经网络时,模型参数例如可以是各个神经元的权重、常数参数、辅助矩阵等等中的至少一项。在选定的业务模型是决策树时,模型参数例如是各个节点的权重参数、节点之间的连接关系及连接权重等等。在选定的业务模型是其他形式的机器学习模型时,模型参数还可以是其他参数,在此不再一一例举。
这些模型参数的初始值可以按照预定方式确定,例如完全随机取值、在预设区间内随机取值、赋予设定值等等。有了这些初始值,当接收到业务数据,或者根据业务数据提取的相关特征时,业务模型就可以给出相应业务预测结果,例如分类结果、打分结果等等。
接着,在步骤302中,使用多个训练样本训练经过初始化的业务模型至模型参数收敛,得到初始业务模型。
由于经过步骤301的模型参数初始化之后,一旦接收到业务数据,选定的业务模型可以按照相应逻辑运行,给出相应的业务预测结果,如此就可以利用训练样本对初始化的业务模型进行训练。各个训练样本可以对应有样本业务数据,以及对应的样本标签。对初始化的业务模型的训练过程例如可以是:依次将各条样本业务数据输入经过初始化的业务模型,根据业务模型输出的业务预测结果与相应业务标签的对比,调整模型参数。
经过一定数量的训练样本的调整之后,业务模型的每个模型参数变化将越来越小,直至趋近于某个值。也就是模型参数收敛。模型参数收敛可以通过各个模型参数的波动值来描述,也可以通过损失函数来描述。这是因为,损失函数通常是模型参数的函数,当损失函数收敛时,代表着模型参数收敛。例如当损失函数的最大变化值或者模型参数的波动小于预定阈值时,可以确定模型参数收敛。选定的业务模型完成当前阶段训练,得到的业务模型可以称为初始业务模型。
这里的初始业务模型训练过程可以采用任何合适的方式进行,在此不再赘述。
然后,在步骤303,基于对初始业务模型的修剪,确定初始业务模型的多个子模型。 可以理解,为了从初始业务模型中获取可以代替初始业务模型的子模型,可以按照业务需求对初始业务模型进行修剪,从而得到多个初始模型的子模型。这些子模型又可以称为候选模型。
值得说明的是,对初始业务模型的修剪可以是在初始业务模型的基础上多次进行修剪,也可以是在修剪后的子模型基础上叠加修剪,如前文对图2示出的示例部分的描述,在此不再赘述。
对模型的修剪基于以下之一的方式,按照模型参数由小到大的顺序进行:修剪掉预定比例(如20%)的模型参数、修剪掉预定数量(如1000个)的模型参数、修剪得到规模不超过预定大小(如1000兆字节)的模型,等等。
可以理解,通常有至少一部分的模型参数,可以在一定程度上体现模型单元(如神经元、树节点等)的重要程度,例如权重参数。在对业务模型进行修剪时,为了减少参数数量,可以修剪模型单元,也可以修剪模型单元之间的连接关系。下面参考图4所示,以业务模型为神经网络,模型单元为神经元为例进行时说明。
一种实施例可以通过减少预定数量或预定比例的模型单元来实现对模型的修剪。例如,在神经网络的每个隐层修剪掉100个或10%的神经元。参考图4所示,由于神经元的重要度需要通过不同隐层的神经元之间的表达关系(图4中的连接线)对应的权重来描述,因此,可以利用权重参数的值来决定删除哪些神经元。图4示出的是一个神经网络中的部分隐层的示意。图4中,在第i隐层,假设虚线表示的神经元对应的与前一层神经元或向后一隐层神经元连接的连接线对应的权重参数都很小,那么这个神经元的重要度比较小,可以被修剪掉。
另一种实施例可以通过减少预定数量或预定比例的连接边来实现对模型的修剪。仍参考图4所示,对于神经网络中的各个连接边(如神经元X1和第i隐层的虚线表示的神经元之间的连接边),如果其对应的权重参数较小,则表明前一个神经元对应后一个神经元的重要度较低,可以将相应连接边删除。这样的网络结构不再是原始的全连接结构,而是前一隐层的各个神经元只对后一隐层相对重要的神经元起作用,后一隐层的各个神经元只关注对其重要性较高的前一隐层的神经元。这样,业务模型的规模也会变小。
在其他实施例中,还可以采用同时减少连接边和模型单元的方式实现模型的修剪,在此不再赘述。修剪模型单元、修剪连接关系都是模型修剪的具体手段,本说明书对具体手段不做限定。通过这样的修剪手段,可以实现修剪掉预定比例的模型参数、修剪掉 预定数量的模型参数、修剪得到规模不超过预定大小的模型等等。
其中,具体修剪掉业务模型的多大一部分,可以根据预定的修剪规则或子模型的规模需求来确定。修剪规则例如可以为:子模型的规模为预定字节数(如1000兆字节)、子模型的规模为初始业务模型的预定比例(如70%)、修剪后的子模型规模与修剪前的模型规模成预定比例(如90%)、修剪掉权重小于预定权重阈值的连接边等等。总之,修剪后的模型可以放弃重要度低的模型单元或者连接边,保留重要度高的模型单元及连接边。
在获取子模型的过程中,一方面,剪切掉一部分后的初始业务模型的参数需要进一步调整,因此,需要对剪切后的模型进一步训练。另一方面,需要验证初始业务模型裁剪掉的部分是否自始不需要,因此,可以将修剪后的模型中的模型参数重置为初始化状态,并利用多个训练样本进行训练。训练后的模型记为初始业务模型的子模型。
可以理解的是,由于初始业务模型在被训练至收敛时停止,这样,在修剪掉其中一部分时,可能误删重要的模型单元,造成模型性能下降等问题。因此,在训练修剪后的模型时,得到的子模型性能是不确定的。例如,修剪掉一部分后的业务模型,如果误删了重要模型单元,可能会导致模型参数(或损失函数)不会收敛、收敛速度降低,或者模型准确度降低等。因此,还可以记录各个子模型在训练后相应的性能指标,例如准确度、模型大小、收敛性等等。
在本步骤303中,假设可以得到N个子模型。其中,N是一个正整数,其可以是预设的迭代次数(预定次数)、预设的子模型数量(预定数量),也可以是按照设定的修剪条件达到的数量。例如,在修剪后的子模型基础上叠加修剪的情况下,越后得到的子模型规模越小,修剪条件可以为最后得到的子模型规模小于预定的规模阈值(如100兆字节)。此时,可以在子模型规模小于预定规模时,结束修剪,得到的子模型数量N为实际得到的子模型数量。
接着,通过步骤304,基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
差分隐私(differential privacy)是密码学中的一种手段,旨在提供一种当从统计数据库查询时,最大化数据查询的准确性,同时最大限度减少识别其记录的机会。设有随机算法M,PM为M所有可能的输出构成的集合。对于任意两个邻近数据集D和D'以及PM的任何子集SM,若随机算法M满足:Pr[M(D)∈SM]<=e
ε×Pr[M(D')∈SM], 则称算法M提供ε-差分隐私保护,其中参数ε称为隐私保护预算,用于平衡隐私保护程度和准确度。ε通常可以预先设定。ε越接近0,e
ε越接近1,随机算法对两个邻近数据集D和D'的处理结果越接近,隐私保护程度越强。
在该步骤304中,相当于在压缩率和模型指标之间进行平衡。差分隐私的经典实现例如拉普拉斯机制、指数机制等。通常。拉普拉斯机制可以用于为数值添加噪声扰动,而对于数值扰动没有意义的情况,更适合采用指数机制。这里,从多个子模型中选择出一个子模型作为目标业务模型,由于是对子模型的选择,而非对子模型内部结构等进行处理,可以看作是对于数值扰动没有意义的情况,可以优选采用指数机制进行。
下面作为一个具体示例,详细介绍在差分隐私的第一方式为指数机制的情况下,如何利用差分隐私的第一方式从子模型中选择出目标业务模型的过程。
步骤303中确定的N个子模型可以看作N个实体对象,每个实体对象对应一个数值r
i,其中i的取值范围例如可以是1至N,各个数值r
i构成查询函数的输出值域R。这里旨在从值域R中选择一个r
i,将其对应的实体对象,即子模型作为目标业务模型。假设用D表示给定数据集(这里可以理解为训练样本集),在指数机制下,函数q(D,r
i)称为输出值r
i的可用性函数。
结合各个子模型,其可用性与模型指标密切相关。例如在模型指标包括相较于初始业务模型的压缩率、在测试样本集上的准确度的情况下,由于压缩率越大子模型的规模越小,准确度越高表明子模型越理想,因此,在一个具体例子中,可用性函数可以与相应子模型i的压缩率s
i、准确度z
i正相关。这里,可以将各个子模型分别对应的可用性函数的函数值记为相应子模型的可用性系数,例如:
q(D,r
i)=s
i×z
i
在其他具体例子中,模型指标可能包括召回率、F1分数等等,可用性函数也可以根据实际的模型指标具有其他合理表达,在此不再赘述。
在指数机制ε-差分隐私中,对于给定的隐私代价ε(预设的值,例如0.1),给定数据集D及可用性函数q(D,r),隐私保护机制A(D,q)在当且仅当下述表达式成立时,满足ε-差分隐私:
其中,∝表示正比于。Δ
q可以为敏感因子,用于表示单一数据(上面的示例中的单 个训练样本)的改变导致的可用性函数的最大改变值。这里,由于准确度和压缩率都在0到1之间取值,因此,单一数据改变时,q的最大改变为1,也就是说Δ
q取1。在其他实施例中,q的表达方式不同,Δ
q可以根据其他方式来确定,在此不作限定。
在一个具体例子中,隐私保护机制A可以为按照采样概率进行采样的机制,子模型i对应的采样概率可以记为A(D,q
i)。例如,第i个子模型的采样概率可以为:
其中,j表示任一个子模型。这样,在对各个子模型进行采样的采样概率中引入差分隐私的指数机制,按照各个子模型对应的被采样到的采样概率,可以在值域R中采样(即在各个子模型中采样)。
采样时,根据一个具体例子,可以将0-1之间的数划分为与值域R中的数值个数(子模型数量)一致的子区间,每个子区间的长度与上述采样概率对应。当使用预先选定的随机算法生成0-1之间的一个随机数时,将随机数所在区间对应的值域R中的某个数值(对应一个子模型)作为采样到的目标值。该目标值对应的子模型可以作为目标业务模型。根据另一个具体例子,值域R为连续数值区间,可以按照采样概率划分为长度与相应子模型的采样概率正相关的子区间,则直接在至于R上随机取值,所取值落入的区间对应的子模型就可以作为目标业务模型。
可以理解的是,这里通过差分隐私中的指数机制,按照采样概率完成对子模型的采样,对从子模型中选择目标业务模型增加了随机性。由此,难以根据初始业务模型推测出子模型的具体结构,使得目标业务模型难以推测实现对目标业务模型和业务数据的隐私保护。
可以理解,在确定目标业务模型的过程中,各个子模型经过初步的训练,以从中挑选指出合适的子模型,作为最终的子模型,来避免对庞大的初始业务模型进行完全训练之后大量删除模型参数导致的大量计算。因此,所选择的目标业务模型可以进一步训练,以更好地用于针对给定的业务数据,进行业务预测,得到业务预测结果(例如评分结果、分类结果等)。
对目标业务模型的一个训练过程例如为:将各个训练样本输入选择出的目标业务模型,并根据输出结果和样本标签的对比,调整模型参数。
通常,输出结果和样本标签的对比,在输出结果为数值的情况下,可以通过诸如差 值、差值的绝对值之类方式衡量损失,在输出结果为向量或多个数值的情况下,可以通过诸如方差、欧氏距离之类的方式衡量损失。在得到损失之后,可以以最小化损失为目标调整模型参数。该过程中还可以采用一些优化算法,以加快模型参数(或损失函数)的收敛速度。例如采用梯度下降法等优化算法。
根据一个可能的设计,为了进一步保护数据隐私,还可以通过在损失梯度中添加干扰噪声的方式,引入差分隐私的方法,调整模型参数,以基于隐私保护训练目标业务模型。此时,图3示出的流程还可以包括以下步骤:
步骤305,利用多个训练样本对目标业务模型基于差分隐私的第二方式进行训练,使得训练后的目标业务模型用于针对给定的业务数据进行业务预测。差分隐私的实现方式有很多,这里引入差分隐私的目的在于为数据添加噪声,例如可以通过高斯噪声、拉普拉斯噪声等方式实现,在此不做限定。
在一个实施方式中,针对输入目标业务模型的第一批样本,可以通过以下步骤调整模型参数:首先,确定第一批样本所对应的损失的原始梯度;接着向该原始梯度添加用于实现差分隐私的噪声,得到包含噪声的梯度;然后,利用包含噪声的梯度,调整目标业务模型的模型参数。可以理解,这里的第一批样本可以是一个训练样本,也可以是多个训练样本。在第一批样本包含多个训练样本的情况下,第一批样本对应的损失可以是多个训练样本对应的损失和、平均损失等。
作为一个示例,假设针对上述第一批样本,得到的第一原始梯度为:
其中,t表示当前为第t轮次的迭代训练,x
i表示第一批样本中的第i个样本,g
t(x
i)表示第t轮中第i个样本的损失梯度,θ
t表示第t轮训练开始时的模型参数,L(θ
t,x
i)表示第i个样本对应的损失函数。
如前所述,对上述原始梯度添加实现差分隐私的噪声,可以通过诸如拉普拉斯噪声、高斯噪声等方式实现。
在一个实施例中,以差分隐私的第二方式为高斯噪声为例,可以基于预设的裁剪阈值,对原始梯度进行梯度裁剪,得到裁剪梯度,再基于该裁剪阈值和预定的噪声缩放系数(预先设定的超参),确定用于实现差分隐私的高斯噪声,然后将裁剪梯度与高斯噪声融合(例如求和),得到包含噪声的梯度。可以理解的是,该第二方式一方面对原始梯度进行裁剪,另一方面将裁剪后的梯度叠加,从而对损失梯度进行满足高斯噪声的差 分隐私处理。
例如,将原始梯度进行梯度裁剪为:
其中,
表示对第t轮中第i个样本裁剪后的梯度,C表示裁剪阈值,‖g(x
i)‖
2表示g
t(x
i)的二阶范数。也就是说,在梯度小于或等于裁剪阈值C的情况下,保留原始梯度,而梯度大于裁剪阈值C的情况下,将原始梯度按照大于裁剪阈值C的比例裁剪到相应大小。
为裁剪后的梯度添加高斯噪声,得到包含噪声的梯度,例如为:
其中,N表示第一批样本所包含的样本数量,
表示第t轮中N个样本对应的包含噪声的梯度;
表示概率密度符合以0为均值、σ
2C
2I为方差的高斯分布的高斯噪声;σ表示上述噪声缩放系数,为预先设定的超参,可以按需设定;C为上述裁剪阈值;I表示指示函数,可以取0或1,比如,可以设定在多轮训练中的偶数轮次取1,而奇数轮次取0。上式中,第一批样本包含多个训练样本时,包含噪声的梯度为对这多个训练样本的原始梯度裁剪后的平均裁剪梯度上叠加高斯噪声。当第一批样本仅包含一个训练样本时,上式中包含噪声的梯度为对该训练样本的原始梯度裁剪后叠加高斯噪声。
于是,使用添加高斯噪声后的梯度,仍以最小化所述样本i对应的损失为目标,模型参数可以按照以下方式调整为:
其中,η
t表示第t轮的学习步长,或者说学习率,为预先设定的超参数,例如为0.5、0.3等;θ
t+1表示经过第t轮(包含第一批样本)训练得到的调整后模型参数。在梯度添加高斯噪声满足差分隐私的情况下,模型参数的调整满足差分隐私。
据此,经过多轮迭代训练后,可以得到基于差分隐私的目标业务模型。由于模型训练过程中加入了高斯噪声,因此,难以从目标业务模型所呈现出来的数据推测模型结构或者反推业务数据,如此,可以进一步提高隐私数据保护的有效性。
训练后的目标业务模型可以用于,针对给定的业务数据,进行相应业务预测。这里的业务数据是和训练样本类型一致的业务数据,例如用户的金融相关数据,可以通过目标业务模型进行用户借贷风险性预测
回顾以上过程,本说明书实施例提供的基于隐私保护确定目标业务模型的方法,先对选定的复杂业务模型进行初始训练,得到初始业务模型,然后对初始业务模型进行修剪,并对修剪后的业务模型在参数重置回初始化状态的情形下进行训练,以考验修剪掉的模型参数是否自始不重要。对于得到的多个子模型,通过差分隐私的方式,从中选择目标业务模型。这样,可以获取隐私保护的压缩模型,在实现模型压缩的基础上,为模型提供隐私保护。
根据另一方面的实施例,还提供一种基于隐私保护确定目标业务模型的装置。其中,这里的业务模型可以是用于针对给定的业务数据,进行诸如分类、打分等业务处理的模型。这里的业务数据可以是文字、图像、语音、视频、动画等各种类型的数据。该装置可以设置于具有一定计算能力的系统、设备、装置、平台或服务器。
图5示出根据一个实施例的基于隐私保护确定目标业务模型的装置的示意性框图。如图5所示,装置500包括:
初始化单元51,配置为按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化选定的业务模型;
初始训练单元52,配置为使用多个训练样本训练经过初始化的选定的业务模型至模型参数收敛,得到初始业务模型;
修剪单元53,配置为基于对初始业务模型的修剪,确定初始业务模型的多个子模型,其中,各个子模型各自对应有通过初始化单元51和初始训练单元52重新训练确定的模型参数以及模型指标:初始化单元51将修剪后的业务模型的模型参数重置为初始化的业务模型中的相应模型参数的初始值;初始训练单元52将多个训练样本依次输入修剪后的业务模型,并基于相应样本标签与修剪后的业务模型的输出结果的对比,调整模型参数;
确定单元54,配置为基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
根据一个实施方式,修剪单元53进一步可以配置为:按照初始业务模型的模型参数,对初始业务模型进行修剪,得到第一修剪模型;将对应有经过重新训练得到的模型参数的第一修剪模型,作为第一子模型;迭代修剪第一子模型得到后续子模型,直至满足结束条件。
在一个实施例中,上述结束条件可以包括,迭代次数达到预定次数、子模型数量达 到预定数量、最后一个子模型的规模小于设定的规模阈值等等中的至少一项。
在一个可选的实现方式中,修剪单元53对模型的修剪基于以下之一的方式,按照模型参数由小到大的顺序进行:修剪掉预定比例的模型参数、修剪掉预定数量的模型参数、修剪得到规模不超过预定大小的模型,等等。
根据一个可能的设计,差分隐私的第一方式为指数机制,确定单元54进一步可以配置为:按照各个子模型各自对应的模型指标,确定各个子模型分别对应的各个可用性系数;根据各个可用性系数,利用指数机制确定各个子模型分别对应的各个采样概率;按照各个采样概率在多个子模型中采样,将被采样到的子模型作为目标业务模型。
在一个实施方式中,装置500还可以包括隐私训练单元55,配置为:利用多个训练样本对目标业务模型基于差分隐私的第二方式进行训练,使得训练后的目标业务模型用于针对给定的业务数据进行保护数据隐私的业务预测。
在一个进一步的实施例中,多个训练样本包括第一批样本,第一批样本中的样本i对应有经目标业务模型处理后得到的损失,隐私训练单元55进一步配置为:确定样本i对应的损失的原始梯度;利用差分隐私的第二方式在原始梯度上添加噪声,得到包含噪声的梯度;利用包含噪声的梯度,以最小化样本i对应的损失为目标,调整目标业务模型的模型参数。
在一个更进一步的实施例中,差分隐私的第二方式为添加高斯噪声,隐私训练单元55还可以配置为:基于预设的裁剪阈值,对原始梯度进行裁剪,得到裁剪梯度;利用基于裁剪阈值确定的高斯分布,确定用于实现差分隐私的高斯噪声,其中,高斯分布的方差与裁剪阈值的平方正相关;将高斯噪声与裁剪梯度叠加,得到包含噪声的梯度。
值得说明的是,图5所示的装置500是与图3示出的方法实施例相对应的装置实施例,图3示出的方法实施例中的相应描述同样适用于装置500,在此不再赘述。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图3所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图3所述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时, 可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本说明书的技术构思的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本说明书的技术构思的具体实施方式而已,并不用于限定本说明书的技术构思的保护范围,凡在本说明书实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本说明书的技术构思的保护范围之内。
Claims (19)
- 一种基于隐私保护确定目标业务模型的方法,所述目标业务模型用于处理给定的业务数据,得到相应的业务预测结果;所述方法包括:按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化所述选定的业务模型;使用多个训练样本训练经过初始化的所述选定的业务模型至模型参数收敛,得到初始业务模型;基于对所述初始业务模型的修剪,确定所述初始业务模型的多个子模型,其中,各个子模型各自对应有通过以下方式重新训练确定的模型参数以及模型指标:将修剪后的业务模型的模型参数重置为初始化的业务模型中的相应模型参数的初始值;将多个训练样本依次输入修剪后的业务模型,并基于相应样本标签与修剪后的业务模型的输出结果的对比,调整模型参数;基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
- 根据权利要求1所述的方法,其中,所述基于对所述初始业务模型的修剪,确定所述初始业务模型的多个子模型包括:按照所述初始业务模型的模型参数,对所述初始业务模型进行修剪,得到第一修剪模型;将对应有经过重新训练得到的模型参数的第一修剪模型,作为第一子模型;迭代修剪所述第一子模型得到后续子模型,直至满足结束条件。
- 根据权利要求2所述的方法,所述结束条件包括,迭代次数达到预定次数、子模型数量达到预定数量、最后一个子模型的规模小于设定的规模阈值中的至少一项。
- 根据权利要求1或2所述的方法,其中,对模型的修剪基于以下之一的方式,按照模型参数由小到大的顺序进行:修剪掉预定比例的模型参数、修剪掉预定数量的模型参数、修剪得到规模不超过预定大小的模型。
- 根据权利要求1所述的方法,其中,所述差分隐私的第一方式为指数机制,所述基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型包括:按照各个子模型各自对应的模型指标,确定各个子模型分别对应的各个可用性系数;根据各个可用性系数,利用指数机制确定各个子模型分别对应的各个采样概率;按照各个采样概率在所述多个子模型中采样,将被采样到的子模型作为目标业务模 型。
- 根据权利要求1所述的方法,还包括:利用多个训练样本对所述目标业务模型基于差分隐私的第二方式进行训练,使得训练后的目标业务模型用于针对给定的业务数据进行保护数据隐私的业务预测。
- 根据权利要求6所述的方法,其中,所述多个训练样本包括第一批样本,所述第一批样本中的样本i对应有经所述目标业务模型处理后得到的损失,所述利用多个训练样本对所述目标业务模型基于差分隐私的第二方式进行训练包括:确定所述样本i对应的损失的原始梯度;利用所述差分隐私的第二方式在所述原始梯度上添加噪声,得到包含噪声的梯度;利用所述包含噪声的梯度,以最小化所述样本i对应的损失为目标,调整所述目标业务模型的模型参数。
- 根据权利要求7所述的方法,其中,所述差分隐私的第二方式为添加高斯噪声,所述利用所述差分隐私的第二方式在所述原始梯度上添加噪声,得到包含噪声的梯度包括:基于预设的裁剪阈值,对所述原始梯度进行裁剪,得到裁剪梯度;利用基于所述裁剪阈值确定的高斯分布,确定用于实现差分隐私的高斯噪声,其中,所述高斯分布的方差与所述裁剪阈值的平方正相关;将所述高斯噪声与所述裁剪梯度叠加,得到所述包含噪声的梯度。
- 根据权利要求1所述的方法,其中,所述业务数据包括图片、音频、字符中的至少一项。
- 一种基于隐私保护确定目标业务模型的装置,所述目标业务模型用于处理给定的业务数据,得到相应的业务预测结果;所述装置包括:初始化单元,配置为按照预定方式为选定的业务模型确定各个模型参数分别对应的初始值,从而初始化所述选定的业务模型;初始训练单元,配置为使用多个训练样本训练经过初始化的所述选定的业务模型至模型参数收敛,得到初始业务模型;修剪单元,配置为基于对所述初始业务模型的修剪,确定所述初始业务模型的多个子模型,其中,各个子模型各自对应有通过所述初始化单元和所述初始训练单元重新训练确定的模型参数以及模型指标:所述初始化单元将修剪后的业务模型的模型参数重置为初始化的业务模型中的相应模型参数的初始值;所述初始训练单元将多个训练样本依次输入修剪后的业务模型,并基于相应样本标签与修剪后的业务模型的输出结果的对比, 调整模型参数;确定单元,配置为基于各个子模型各自对应的模型指标,利用差分隐私的第一方式从各个子模型中选择出目标业务模型。
- 根据权利要求10所述的装置,其中,所述修剪单元进一步配置为:按照所述初始业务模型的模型参数,对所述初始业务模型进行修剪,得到第一修剪模型;将对应有经过重新训练得到的模型参数的第一修剪模型,作为第一子模型;迭代修剪所述第一子模型得到后续子模型,直至满足结束条件。
- 根据权利要求11所述的装置,所述结束条件包括,迭代次数达到预定次数、子模型数量达到预定数量、最后一个子模型的规模小于设定的规模阈值中的至少一项。
- 根据权利要求10或11所述的装置,其中,所述修剪单元对模型的修剪基于以下之一的方式,按照模型参数由小到大的顺序进行:修剪掉预定比例的模型参数、修剪掉预定数量的模型参数、修剪得到规模不超过预定大小的模型。
- 根据权利要求10所述的装置,其中,所述差分隐私的第一方式为指数机制,所述确定单元进一步配置为:按照各个子模型各自对应的模型指标,确定各个子模型分别对应的各个可用性系数;根据各个可用性系数,利用指数机制确定各个子模型分别对应的各个采样概率;按照各个采样概率在所述多个子模型中采样,将被采样到的子模型作为目标业务模型。
- 根据权利要求10所述的装置,其中,所述装置还包括隐私训练单元,配置为:利用多个训练样本对所述目标业务模型基于差分隐私的第二方式进行训练,使得训练后的目标业务模型用于针对给定的业务数据进行保护数据隐私的业务预测。
- 根据权利要求15所述的装置,其中,所述多个训练样本包括第一批样本,所述第一批样本中的样本i对应有经所述目标业务模型处理后得到的损失,所述隐私训练单元进一步配置为:确定所述样本i对应的损失的原始梯度;利用所述差分隐私的第二方式在所述原始梯度上添加噪声,得到包含噪声的梯度;利用所述包含噪声的梯度,以最小化所述样本i对应的损失为目标,调整所述目标业务模型的模型参数。
- 根据权利要求16所述的装置,其中,所述差分隐私的第二方式为添加高斯噪声,所述隐私训练单元进一步配置为:基于预设的裁剪阈值,对所述原始梯度进行裁剪,得到裁剪梯度;利用基于所述裁剪阈值确定的高斯分布,确定用于实现差分隐私的高斯噪声,其中,所述高斯分布的方差与所述裁剪阈值的平方正相关;将所述高斯噪声与所述裁剪梯度叠加,得到所述包含噪声的梯度。
- 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-9中任一项的所述的方法。
- 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-9中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010276685.8 | 2020-04-10 | ||
CN202010276685.8A CN111177792B (zh) | 2020-04-10 | 2020-04-10 | 基于隐私保护确定目标业务模型的方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021204272A1 true WO2021204272A1 (zh) | 2021-10-14 |
Family
ID=70655223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/086275 WO2021204272A1 (zh) | 2020-04-10 | 2021-04-09 | 基于隐私保护确定目标业务模型 |
Country Status (3)
Country | Link |
---|---|
CN (2) | CN113515770B (zh) |
TW (1) | TWI769754B (zh) |
WO (1) | WO2021204272A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114185619A (zh) * | 2021-12-14 | 2022-03-15 | 平安付科技服务有限公司 | 基于分布式作业的断点补偿方法、装置、设备及介质 |
CN114338552A (zh) * | 2021-12-31 | 2022-04-12 | 河南信大网御科技有限公司 | 一种确定时延拟态系统 |
CN114780999A (zh) * | 2022-06-21 | 2022-07-22 | 广州中平智能科技有限公司 | 一种深度学习数据隐私保护方法、系统、设备和介质 |
CN115766507A (zh) * | 2022-11-16 | 2023-03-07 | 支付宝(杭州)信息技术有限公司 | 业务异常检测方法及装置 |
CN116432039B (zh) * | 2023-06-13 | 2023-09-05 | 支付宝(杭州)信息技术有限公司 | 协同训练方法及装置、业务预测方法及装置 |
CN116805082A (zh) * | 2023-08-23 | 2023-09-26 | 南京大学 | 一种保护客户端隐私数据的拆分学习方法 |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113515770B (zh) * | 2020-04-10 | 2024-06-18 | 支付宝(杭州)信息技术有限公司 | 基于隐私保护确定目标业务模型的方法及装置 |
CN111368337B (zh) * | 2020-05-27 | 2020-09-08 | 支付宝(杭州)信息技术有限公司 | 保护隐私的样本生成模型构建、仿真样本生成方法及装置 |
CN111475852B (zh) * | 2020-06-19 | 2020-09-15 | 支付宝(杭州)信息技术有限公司 | 基于隐私保护针对业务模型进行数据预处理的方法及装置 |
CN112214791B (zh) * | 2020-09-24 | 2023-04-18 | 广州大学 | 基于强化学习的隐私策略优化方法、系统及可读存储介质 |
CN114936650A (zh) * | 2020-12-06 | 2022-08-23 | 支付宝(杭州)信息技术有限公司 | 基于隐私保护的联合训练业务模型的方法及装置 |
CN112561076B (zh) * | 2020-12-10 | 2022-09-20 | 支付宝(杭州)信息技术有限公司 | 模型处理方法和装置 |
CN112632607B (zh) * | 2020-12-22 | 2024-04-26 | 中国建设银行股份有限公司 | 一种数据处理方法、装置及设备 |
CN112926090B (zh) * | 2021-03-25 | 2023-10-27 | 支付宝(杭州)信息技术有限公司 | 基于差分隐私的业务分析方法及装置 |
US20220318412A1 (en) * | 2021-04-06 | 2022-10-06 | Qualcomm Incorporated | Privacy-aware pruning in machine learning |
CN113221717B (zh) * | 2021-05-06 | 2023-07-18 | 支付宝(杭州)信息技术有限公司 | 一种基于隐私保护的模型构建方法、装置及设备 |
CN113420322B (zh) * | 2021-05-24 | 2023-09-01 | 阿里巴巴新加坡控股有限公司 | 模型训练、脱敏方法、装置、电子设备及存储介质 |
CN113268772B (zh) * | 2021-06-08 | 2022-12-20 | 北京邮电大学 | 基于差分隐私的联合学习安全聚合方法及装置 |
CN113486402B (zh) * | 2021-07-27 | 2024-06-04 | 平安国际智慧城市科技股份有限公司 | 数值型数据查询方法、装置、设备及存储介质 |
CN113923476B (zh) * | 2021-09-30 | 2024-03-26 | 支付宝(杭州)信息技术有限公司 | 一种基于隐私保护的视频压缩方法及装置 |
CN114429222A (zh) * | 2022-01-19 | 2022-05-03 | 支付宝(杭州)信息技术有限公司 | 一种模型的训练方法、装置及设备 |
CN115081024B (zh) * | 2022-08-16 | 2023-01-24 | 杭州金智塔科技有限公司 | 基于隐私保护的去中心化业务模型训练方法及装置 |
CN117056979B (zh) * | 2023-10-11 | 2024-03-29 | 杭州金智塔科技有限公司 | 基于用户隐私数据的业务处理模型更新方法及装置 |
CN118606634B (zh) * | 2024-08-08 | 2024-10-22 | 齐鲁工业大学(山东省科学院) | 基于衰减噪声扰动的自适应保隐私分布式学习方法及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368752A (zh) * | 2017-07-25 | 2017-11-21 | 北京工商大学 | 一种基于生成式对抗网络的深度差分隐私保护方法 |
US20200050773A1 (en) * | 2018-06-11 | 2020-02-13 | Grey Market Labs, PBC | Systems and methods for controlling data exposure using artificial-intelligence-based periodic modeling |
CN111177792A (zh) * | 2020-04-10 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | 基于隐私保护确定目标业务模型的方法及装置 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10586068B2 (en) * | 2015-11-02 | 2020-03-10 | LeapYear Technologies, Inc. | Differentially private processing and database storage |
US11341281B2 (en) * | 2018-09-14 | 2022-05-24 | International Business Machines Corporation | Providing differential privacy in an untrusted environment |
US11556846B2 (en) * | 2018-10-03 | 2023-01-17 | Cerebri AI Inc. | Collaborative multi-parties/multi-sources machine learning for affinity assessment, performance scoring, and recommendation making |
CN109657498B (zh) * | 2018-12-28 | 2021-09-24 | 广西师范大学 | 多条流中top-k共生模式挖掘的差分隐私保护方法 |
CN110084365B (zh) * | 2019-03-13 | 2023-08-11 | 西安电子科技大学 | 一种基于深度学习的服务提供系统及方法 |
CN110719158B (zh) * | 2019-09-11 | 2021-11-23 | 南京航空航天大学 | 基于联合学习的边缘计算隐私保护系统及保护方法 |
CN110874488A (zh) * | 2019-11-15 | 2020-03-10 | 哈尔滨工业大学(深圳) | 一种基于混合差分隐私的流数据频数统计方法、装置、系统及存储介质 |
-
2020
- 2020-04-10 CN CN202010626329.4A patent/CN113515770B/zh active Active
- 2020-04-10 CN CN202010276685.8A patent/CN111177792B/zh active Active
-
2021
- 2021-03-24 TW TW110110604A patent/TWI769754B/zh active
- 2021-04-09 WO PCT/CN2021/086275 patent/WO2021204272A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368752A (zh) * | 2017-07-25 | 2017-11-21 | 北京工商大学 | 一种基于生成式对抗网络的深度差分隐私保护方法 |
US20200050773A1 (en) * | 2018-06-11 | 2020-02-13 | Grey Market Labs, PBC | Systems and methods for controlling data exposure using artificial-intelligence-based periodic modeling |
CN111177792A (zh) * | 2020-04-10 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | 基于隐私保护确定目标业务模型的方法及装置 |
Non-Patent Citations (1)
Title |
---|
XU JIAHUI: "Research on Neural Network Compression Techniques: Model Pruning", INFORMATION & COMMUNICATIONS, no. 204, 31 December 2019 (2019-12-31), pages 165 - 167, XP055842096, ISSN: 1673-1131 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114185619A (zh) * | 2021-12-14 | 2022-03-15 | 平安付科技服务有限公司 | 基于分布式作业的断点补偿方法、装置、设备及介质 |
CN114185619B (zh) * | 2021-12-14 | 2024-04-05 | 平安付科技服务有限公司 | 基于分布式作业的断点补偿方法、装置、设备及介质 |
CN114338552A (zh) * | 2021-12-31 | 2022-04-12 | 河南信大网御科技有限公司 | 一种确定时延拟态系统 |
CN114338552B (zh) * | 2021-12-31 | 2023-07-07 | 河南信大网御科技有限公司 | 一种确定时延拟态系统 |
CN114780999A (zh) * | 2022-06-21 | 2022-07-22 | 广州中平智能科技有限公司 | 一种深度学习数据隐私保护方法、系统、设备和介质 |
CN114780999B (zh) * | 2022-06-21 | 2022-09-27 | 广州中平智能科技有限公司 | 一种深度学习数据隐私保护方法、系统、设备和介质 |
CN115766507A (zh) * | 2022-11-16 | 2023-03-07 | 支付宝(杭州)信息技术有限公司 | 业务异常检测方法及装置 |
CN116432039B (zh) * | 2023-06-13 | 2023-09-05 | 支付宝(杭州)信息技术有限公司 | 协同训练方法及装置、业务预测方法及装置 |
CN116805082A (zh) * | 2023-08-23 | 2023-09-26 | 南京大学 | 一种保护客户端隐私数据的拆分学习方法 |
CN116805082B (zh) * | 2023-08-23 | 2023-11-03 | 南京大学 | 一种保护客户端隐私数据的拆分学习方法 |
Also Published As
Publication number | Publication date |
---|---|
TW202139045A (zh) | 2021-10-16 |
TWI769754B (zh) | 2022-07-01 |
CN111177792A (zh) | 2020-05-19 |
CN111177792B (zh) | 2020-06-30 |
CN113515770A (zh) | 2021-10-19 |
CN113515770B (zh) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021204272A1 (zh) | 基于隐私保护确定目标业务模型 | |
CN109460793B (zh) | 一种节点分类的方法、模型训练的方法及装置 | |
WO2021204269A1 (zh) | 分类模型的训练、对象分类 | |
Nazarenko et al. | Features of application of machine learning methods for classification of network traffic (features, advantages, disadvantages) | |
CN109543112A (zh) | 一种基于循环卷积神经网络的序列推荐方法及装置 | |
CN109189889B (zh) | 一种弹幕识别模型建立方法、装置、服务器及介质 | |
CN110659394B (zh) | 基于双向邻近度的推荐方法 | |
CN106411683B (zh) | 一种关键社交信息的确定方法及装置 | |
CN115409155A (zh) | 基于Transformer增强霍克斯过程的信息级联预测系统及方法 | |
CN117371508A (zh) | 模型压缩方法、装置、电子设备以及存储介质 | |
WO2022182905A1 (en) | Stochastic noise layers | |
CN116304518A (zh) | 用于信息推荐的异质图卷积神经网络模型构建方法及系统 | |
CN114116995B (zh) | 基于增强图神经网络的会话推荐方法、系统及介质 | |
CN118467992B (zh) | 一种基于元启发式算法优化的短期电力负荷预测方法、系统及存储介质 | |
WO2019167240A1 (ja) | 情報処理装置、制御方法、及びプログラム | |
Dash | DECPNN: A hybrid stock predictor model using Differential Evolution and Chebyshev Polynomial neural network | |
CN107402984B (zh) | 一种基于主题的分类方法及装置 | |
CN110555161A (zh) | 一种基于用户信任和卷积神经网络的个性化推荐方法 | |
CN111931035A (zh) | 业务推荐方法、装置及设备 | |
CN116541592A (zh) | 向量生成方法、信息推荐方法、装置、设备及介质 | |
CN112085040B (zh) | 对象标签确定方法、装置和计算机设备 | |
CN112036446B (zh) | 目标识别特征融合的方法、系统、介质及装置 | |
CN116561371A (zh) | 一种基于多实例学习和标签关系图的多标签视频分类方法 | |
CN113076450A (zh) | 一种目标推荐列表的确定方法和装置 | |
CN113792163B (zh) | 多媒体推荐方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21784502 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21784502 Country of ref document: EP Kind code of ref document: A1 |