CN113515770A - Method and device for determining target business model based on privacy protection - Google Patents

Method and device for determining target business model based on privacy protection Download PDF

Info

Publication number
CN113515770A
CN113515770A CN202010626329.4A CN202010626329A CN113515770A CN 113515770 A CN113515770 A CN 113515770A CN 202010626329 A CN202010626329 A CN 202010626329A CN 113515770 A CN113515770 A CN 113515770A
Authority
CN
China
Prior art keywords
model
sub
business
initial
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010626329.4A
Other languages
Chinese (zh)
Inventor
熊涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010626329.4A priority Critical patent/CN113515770A/en
Publication of CN113515770A publication Critical patent/CN113515770A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification provides a method and a device for determining a target business model based on privacy protection. And determining each sampling probability corresponding to each service model by using corresponding model indexes through an index mechanism of differential privacy for the obtained plurality of sub-models, and sampling the plurality of sub-models based on the sampling probabilities so as to select the target service model. Therefore, a compression model with privacy protection can be obtained by using a differential privacy mode, and privacy protection is provided for the model on the basis of realizing model compression.

Description

Method and device for determining target business model based on privacy protection
The present application is a divisional application of the invention patent application No. 202010276685.8 entitled "method and apparatus for determining target business model based on privacy protection", filed on 10/4/2020.
Technical Field
One or more embodiments of the present specification relate to the field of computer technology, and more particularly, to a method and apparatus for determining a target business model based on privacy protection by a computer.
Background
With the development of machine learning technology, Deep Neural Networks (DNNs) are favored by those skilled in the art because they mimic the thinking of the human brain and have better effect than simple linear models. The deep neural network is a neural network with at least one hidden layer, and can be used for modeling a complex nonlinear system and improving the model capability.
Deep neural networks are also very large with complex network structures, feature and model parameter systems. For example, a deep neural network may include up to millions of parameters. Therefore, it is desirable to find ways to compress models that reduce the data size and complexity of the models. To this end, it is common in conventional techniques to adjust millions of parameters in a deep neural network using training samples, and then to delete or "prune" unnecessary weights to reduce the network structure to a more manageable size. Reducing the size of the model helps to minimize its memory, reasoning and computational requirements. In some traffic scenarios, many of the weights in a neural network can sometimes be cut down by as much as 99%, resulting in a smaller, more sparse network.
However, this method of pruning after the training is completed requires a high computational cost and performs a large number of "invalid" computations. It is then envisaged to train in a sub-network of the original neural network to find a network that meets the requirements as much as possible. Meanwhile, in the conventional technology, a simpler neural network is easier to acquire raw data. Therefore, a method is needed to be provided, which can not only protect the privacy of data, but also compress the size of the model to realize real-time calculation and end-to-end deployment, thereby improving the performance of the model from multiple aspects.
Disclosure of Invention
One or more embodiments of the present specification describe a method and apparatus for determining a target business model based on privacy protection to solve one or more of the problems mentioned in the background.
According to a first aspect, a method for determining a target service model based on privacy protection is provided, the target service model being configured to process given service data to obtain a corresponding service prediction result; the method comprises the following steps: determining initial values corresponding to the model parameters for the selected service model according to a preset mode, so as to initialize the selected service model; training the initialized selected business model by using a plurality of training samples until model parameters are converged to obtain an initial business model; determining a plurality of sub-models of the initial business model based on the pruning of the initial business model, wherein each sub-model respectively corresponds to a model parameter and a model index determined by retraining in the following mode: resetting the model parameters of the trimmed service model to initial values of corresponding model parameters in the initialized service model; sequentially inputting a plurality of training samples into the trimmed service model, and adjusting model parameters based on comparison between corresponding sample labels and output results of the trimmed service model; and selecting a target business model from each submodel by utilizing a first mode of differential privacy based on the model indexes corresponding to each submodel.
In one embodiment, the determining a plurality of sub-models of the initial business model based on the pruning of the initial business model comprises: pruning the initial business model according to the model parameters of the initial business model to obtain a first pruning model; taking the first pruning model corresponding to the model parameters obtained through retraining as a first sub-model; and iteratively trimming the first submodel to obtain a subsequent submodel until an end condition is met.
In one embodiment, the end condition includes at least one of the number of iterations reaching a predetermined number, the number of submodels reaching a predetermined number, and the size of the last submodel being less than a set size threshold.
In one embodiment, the model is pruned in a manner based on one of the following, in order of increasing model parameters: and trimming off model parameters in a preset proportion, trimming off a preset number of model parameters, and trimming to obtain a model with the scale not exceeding a preset size.
In an embodiment, the first mode of differential privacy is an exponential mechanism, and selecting the target service model from each sub-model by using the first mode of differential privacy based on the model index corresponding to each sub-model includes: determining each availability coefficient corresponding to each submodel according to the model index corresponding to each submodel; determining each sampling probability corresponding to each sub-model by using an index mechanism according to each availability coefficient; and sampling in the plurality of sub-models according to the sampling probabilities, and taking the sampled sub-models as target business models.
In one embodiment, the method further comprises: and training the target business model based on a second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for carrying out business prediction for protecting data privacy aiming at given business data.
In one embodiment, the training samples include a first batch of samples, where a sample i in the first batch of samples corresponds to a loss obtained after the target business model is processed, and the training of the target business model based on the second differential privacy mode using the training samples includes: determining an original gradient of a loss corresponding to the sample i; adding noise on the original gradient by utilizing the second mode of differential privacy to obtain a gradient containing the noise; and adjusting the model parameters of the target business model by using the gradient containing the noise and taking the minimum loss corresponding to the sample i as a target.
In one embodiment, the second way of differential privacy is to add gaussian noise, and the adding noise to the original gradient by using the second way of differential privacy to obtain a gradient containing noise includes: based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient; determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.
In one embodiment, the service data comprises at least one of pictures, audio, characters.
According to a second aspect, there is provided an apparatus for determining a target service model based on privacy protection, the target service model being configured to process given service data to obtain a corresponding service prediction result; the device comprises:
the initialization unit is configured to determine initial values corresponding to the model parameters for the selected business model according to a preset mode, so as to initialize the selected business model;
an initial training unit configured to train the initialized selected service model using a plurality of training samples until model parameters converge to obtain an initial service model;
a pruning unit configured to determine a plurality of sub models of the initial business model based on pruning of the initial business model, wherein each sub model corresponds to a model parameter and a model index determined by the initialization unit and the initial training unit through retraining: the initialization unit resets the model parameters of the trimmed service model to initial values of corresponding model parameters in the initialized service model; the initial training unit inputs a plurality of training samples into the trimmed service model in sequence, and adjusts model parameters based on comparison of corresponding sample labels and output results of the trimmed service model;
and the determining unit is configured to select a target business model from each submodel by using a first mode of differential privacy based on the model indexes corresponding to each submodel.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.
By the method and the device provided by the embodiment of the specification, the selected complex business model is initially trained to obtain an initial business model, then the initial business model is pruned, and the pruned business model is trained under the condition that the parameters are reset to the initialization state, so that whether the pruned model parameters are unimportant from beginning is tested. And selecting a target business model from the obtained multiple sub-models in a differential privacy mode. Therefore, a compression model with privacy protection can be obtained, and privacy protection is provided for the model on the basis of realizing model compression.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation architecture for determining a target business model based on privacy protection in the technical concepts of the present specification;
FIG. 2 illustrates a flow of determining a plurality of sub-networks based on pruning of an initial neural network in a specific example;
FIG. 3 illustrates a flow diagram of a method for determining a target business model based on privacy protection, according to one embodiment;
FIG. 4 illustrates a schematic diagram of neural network pruning in a specific example;
FIG. 5 illustrates a schematic block diagram of an apparatus for determining a target business model based on privacy protection, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of an implementation architecture according to the technical concept of the present specification. Under the technical idea of the present specification, the business model may be a machine learning model for performing various business processes such as classification, scoring, and the like on business data. The business model shown in fig. 1 is implemented by a neural network, and in practice, it can be implemented by other means, such as a decision tree, linear regression, and so on. The service data may be at least one of various manners such as characters, audio, images, and animation, and is determined according to a specific service scenario, which is not limited herein.
For example, the business model can be a machine learning model used for the lending platform to assist in evaluating the lending business risk of the user, the targeted business data can be historical lending behavior data, default data, user figures and the like of the individual user, and the business prediction result is the risk score of the user. For another example, the traffic model may also be a model (such as a convolutional neural network) for classifying objects on the picture, the targeted traffic data may be various pictures, and the traffic prediction result may be, for example, a first object (such as a car), a second object (bicycle), other categories, and the like.
In particular, the present specification implementation architecture is particularly applicable to the case where the business model is a more complex non-linear model. The process of determining the target business model based on the privacy protection can be a process of determining a simplified sub-model with model indexes meeting requirements from a complex initial business model.
Taking the business model as a neural network as an example, as shown in fig. 1, the initial neural network may be a more complex neural network, and the neural network may include more features, weight parameters, other parameters (such as constant parameters, auxiliary matrices), and the like. The model parameters of the initial neural network may be initialized in a predetermined manner, e.g., randomly, set to predetermined values, etc. Under the implementation architecture, the initial neural network is trained through a plurality of training samples until the model parameters (or loss functions) of the initial neural network converge. And then, pruning the initial neural network to obtain a plurality of sub-networks. In pruning the neural network, the pruning may be performed according to a predetermined parameter ratio (e.g., 20%), a predetermined parameter number (e.g., 1000), a predetermined size (e.g., at least 20 megabytes), and so on.
Conventionally, the pruned sub-network of the initial neural network is usually continued training, pruned again on the basis of the continued training, and continued training. That is, it is a step-by-step compression process for the initial neural network. Under the concept of the embodiments of the present specification, after the initial neural network is pruned, the parameters of the pruned sub-network are reset (the initialized state is restored), and the pruned network after the parameters are reset is trained. The purpose of this is to be able to verify whether the pruned neural network structure is not needed from the beginning. Such a conclusion whether it is unnecessary from the beginning can be expressed by evaluation indexes of the model, such as accuracy, recall rate, convergence, and the like.
It is worth mentioning that the pruning of the neural network may include a process of removing a part of neurons in the neural network and/or removing a part of connection relations in the neurons. In an alternative implementation, which neurons are discarded may be referred to by the weight parameter corresponding to the neuron. The weight parameter describes the importance of the neuron, and taking a fully connected neural network as an example, the weights corresponding to each neuron mapped to the next layer from one neuron may be averaged, or the maximum value may be taken, so as to obtain the reference weight. The discarding (pruning) is further performed in the order of the reference weights of the respective neurons from small to large.
As shown in fig. 2, a sub-network pruning flow of a next specific example of the implementation architecture of the present specification is given. In fig. 2, for the remaining part of the neural network after pruning, the model parameters are reset to the initialization state and retrained with training samples, resulting in the first sub-network. Meanwhile, the network structure, evaluation index, etc. of the first subnetwork may be recorded. Then, the step of entering the trim model starts a loop as indicated by the left arrow. And pruning the first sub-network according to the trained model parameters in the first sub-network, resetting the model parameters of the pruned neural network to an initialization state, and retraining the neural network by using training samples to serve as a second sub-network. And continuing to cycle along the left arrow, and so on until an Nth sub-network meeting the end condition is obtained. The ending condition here may be, for example, at least one of the number of iterations reaching a predetermined number (e.g., a preset number N), the number of submodels reaching a predetermined number (e.g., a preset number N), the size of the last submodel being smaller than a set size threshold (e.g., 100 megabytes, etc.), and the like.
In this way, multiple sub-networks of the initial neural network may be obtained. In some alternative embodiments, the arrow on the left side of fig. 2 may go back to the top, that is, after the first sub-network is obtained, the original neural network is reinitialized, the reinitialized neural network is trained, and pruned, the pruned sub-network is trained as the second sub-network, and so on until the nth sub-network is obtained. Where the sub-networks may be of different sizes, e.g., a first sub-network being 80% of the original neural network, a second sub-network being 60% of the original neural network, and so on. In this way, each time the neural network is initialized, some randomization may be performed, that is, each time random sampling is performed on the features or neurons, a small portion (e.g., 1%) of the features and initialization parameters are discarded to cause small perturbation to the initial neural network, and the neural network is initialized each time to be consistent with the initial neural network with small differences to test different neuron actions.
Continuing with the description of fig. 1. For each sub-network, one sub-network can be selected as the target neural network. According to one embodiment, in order to protect data privacy, each pruned sub-network can be regarded as a sub-network set of an initial neural network, and one sub-network is randomly selected as a target neural network based on a differential privacy principle. Therefore, the target business model is determined based on privacy protection in a differential privacy mode, the business model and/or business data privacy can be better protected, and the practicability of the target neural network is improved.
It is understood that the implementation architecture shown in fig. 1 is exemplified by the business model being a neural network, and when the business model is other machine learning models, the neurons in the above description may be replaced by other model elements, for example, when the business model is a decision tree, the neurons may be replaced by tree nodes in the decision tree, and so on.
The target neural network is used for carrying out service prediction on the service data to obtain a corresponding service prediction result. For example, a business prediction result of the identified target category is obtained according to the picture data, a business prediction result of the financial loan risk of the user is obtained according to the user behavior data, and the like.
The specific process of determining the target business model based on privacy protection is described in detail below.
FIG. 3 illustrates a flow of determining a target business model based on privacy protection, according to one embodiment. The business model here may be a model for conducting business processing such as classification, scoring, etc. for given business data. The service data may be various types of data such as text, image, voice, video, animation, etc. The execution subject of the flow may be a system, device, apparatus, platform, or server with certain computing capabilities.
As shown in fig. 3, a method for determining a target business model based on privacy protection may include the steps of: step 301, determining initial values corresponding to the model parameters for the selected service model according to a predetermined mode, thereby initializing the selected service model; step 302, training the initialized service model by using a plurality of training samples until the model parameters are converged to obtain an initial service model; step 303, determining a plurality of submodels of the initial business model based on pruning of the initial business model, wherein each submodel corresponds to a model parameter and a model index determined by retraining in the following manner: resetting the model parameters of the trimmed service model to initial values of corresponding model parameters in the initialized service model; sequentially inputting a plurality of training samples into the trimmed service model, and adjusting model parameters based on comparison between corresponding sample labels and output results of the trimmed service model; and 304, selecting a target service model from each submodel by using a first mode of differential privacy based on the model indexes corresponding to each submodel.
First, in step 301, initial values corresponding to model parameters are determined for a selected service model according to a predetermined manner, so as to initialize the selected service model.
It will be appreciated that for a selected business model, in order to be able to train the model, the model parameters need to be initialized first. I.e. initial values are determined for the respective model parameters. When the selected business model is a neural network, the model parameters may be, for example, at least one of weights of individual neurons, constant parameters, auxiliary matrices, and the like. When the selected business model is a decision tree, the model parameters are, for example, weight parameters of each node, connection relationships between nodes, connection weights, and the like. Where the selected business model is another form of machine learning model, the model parameters may also be other parameters, not to mention here.
The initial values of these model parameters may be determined in a predetermined manner, such as a completely random value, a random value within a predetermined interval, a set value, and so on. With these initial values, when receiving the service data or extracting the relevant features according to the service data, the service model can give the corresponding service prediction results, such as classification results, scoring results, etc.
Next, in step 302, the initialized service model is trained using a plurality of training samples until the model parameters converge, resulting in an initial service model.
After the initialization of the model parameters in step 301, once the service data is received, the selected service model may be operated according to the corresponding logic to give the corresponding service prediction result, so that the initialized service model may be trained by using the training samples. Each training sample may correspond to sample traffic data, and a corresponding sample label. The training process for the initialized business model may be, for example: and sequentially inputting the service data of each sample into the initialized service model, and adjusting the parameters of the model according to the comparison between the service prediction result output by the service model and the corresponding service label.
After a certain number of training samples are adjusted, each model parameter of the business model changes less and less until the model parameter approaches a certain value. I.e. the model parameters converge. The model parameter convergence can be described by the fluctuation values of the respective model parameters, and also by the loss function. This is because the loss function is typically a function of the model parameters, which when converged, represents the convergence of the model parameters. The convergence of the model parameters may be determined, for example, when the maximum variation value of the loss function or the fluctuation of the model parameters is less than a predetermined threshold value. The selected business model completes the training of the current stage, and the obtained business model can be called as an initial business model.
The initial business model training process may be performed in any suitable manner, and will not be described herein.
Then, at step 303, a plurality of sub-models of the initial business model are determined based on the pruning of the initial business model. It can be understood that, in order to obtain a sub-model that can replace the initial business model from the initial business model, the initial business model can be pruned according to business requirements, so as to obtain a plurality of sub-models of the initial model. These sub-models may also be referred to as candidate models.
It should be noted that the pruning of the initial business model may be multiple times based on the initial business model, or may be superimposed on the pruned sub-models, as described above for the example portion shown in fig. 2, and is not described herein again.
The model is pruned in a mode of one of the following modes according to the sequence of model parameters from small to large: pruning away a predetermined proportion (e.g., 20%) of the model parameters, pruning away a predetermined number (e.g., 1000) of the model parameters, pruning to obtain models that are not larger than a predetermined size (e.g., 1000 megabytes), and so forth.
It will be appreciated that there is typically at least a portion of the model parameters, such as weight parameters, that may represent to some extent the importance of the model elements (e.g., neurons, tree nodes, etc.). When the business model is pruned, in order to reduce the number of parameters, the model units may be pruned, or the connection relationship between the model units may be pruned. Referring to fig. 4, the following description will be made by taking a business model as a neural network and model units as neurons as an example.
An embodiment may enable pruning of the model by reducing a predetermined number or proportion of model elements. For example, 100 or 10% of the neurons are trimmed off at each hidden layer of the neural network. Referring to fig. 4, since the importance of the neurons needs to be described by the weights corresponding to the expression relationships (connecting lines in fig. 4) between the neurons in different hidden layers, the values of the weight parameters can be used to decide which neurons to delete. Fig. 4 shows a schematic of a portion of hidden layers in a neural network. In fig. 4, in the ith hidden layer, assuming that the weight parameters corresponding to the connecting lines connecting the neurons in the previous layer or the neurons in the next hidden layer corresponding to the neurons represented by the dashed lines are all small, the importance of the neuron is small and can be trimmed.
Another embodiment may achieve pruning of the model by reducing a predetermined number or proportion of connected edges. Still referring to fig. 4, for each connecting edge in the neural network (e.g., the connecting edge between the neuron X1 and the neuron represented by the dotted line of the i-th hidden layer), if the corresponding weight parameter is smaller, it indicates that the importance degree of the previous neuron to the next neuron is lower, and the corresponding connecting edge may be deleted. Such a network structure is no longer an original fully-connected structure, but each neuron of a previous hidden layer only acts on a neuron of a next hidden layer which is relatively important, and each neuron of the next hidden layer only concerns the neuron of the previous hidden layer which is relatively important. Thus, the size of the business model also becomes smaller.
In other embodiments, the model may be pruned by reducing the number of connecting edges and model units at the same time, which is not described herein again. The model pruning unit and the pruning connection relation are all specific means for model pruning, and the specification does not limit the specific means. By such a pruning means, it is possible to prune a predetermined proportion of model parameters, prune a predetermined number of model parameters, prune a model having a size not exceeding a predetermined size, and the like.
Wherein, how large a part of the business model is specifically pruned can be determined according to a predetermined pruning rule or the scale requirement of the sub-model. The pruning rules may be, for example: the size of the sub-model is a predetermined number of bytes (e.g., 1000 megabytes), the size of the sub-model is a predetermined proportion (e.g., 70%) of the initial traffic model, the size of the sub-model after pruning is a predetermined proportion (e.g., 90%) of the model before pruning, connecting edges with weights less than a predetermined weight threshold are pruned, and so on. In short, the clipped model may leave out the model cells or connecting edges with low importance, and leave the model cells and connecting edges with high importance.
In the process of acquiring the sub-models, on one hand, parameters of the initial business model with a part cut off need to be further adjusted, and therefore, the cut model needs to be further trained. On the other hand, it is necessary to verify whether the cut-out portion of the initial business model is unnecessary from the beginning, so that the model parameters in the model after the cutting can be reset to the initialization state, and the training can be performed by using a plurality of training samples. And marking the trained model as a sub-model of the initial business model.
It can be understood that, since the initial business model is stopped when being trained to converge, when some of the business models are pruned, important model units may be deleted by mistake, which causes problems such as model performance degradation. Therefore, when training the pruned model, the resulting sub-model performance is uncertain. For example, if a part of the traffic model is clipped, and an important model unit is erroneously deleted, the model parameters (or loss function) may not converge, the convergence rate may be reduced, or the model accuracy may be reduced. Therefore, the corresponding performance index, such as accuracy, model size, convergence, etc., of each sub-model after training can also be recorded.
In this step 303, it is assumed that N submodels are available. Where N is a positive integer, which may be a preset number of iterations (a preset number), a preset number of submodels (a preset number), or a number that is reached according to a set pruning condition. For example, in the case of superimposing pruning on the pruned sub-model, the smaller the size of the sub-model obtained later, the pruning condition may be that the size of the sub-model obtained last is smaller than a predetermined size threshold (e.g., 100 megabytes). In this case, when the size of the submodel is smaller than the predetermined size, the trimming may be ended, and the obtained number N of submodels may be the number of actually obtained submodels.
Next, in step 304, a target business model is selected from each submodel using the first method of differential privacy based on the model indices corresponding to each submodel.
Differential privacy (differential privacy) is a means in cryptography that aims to provide a way to maximize the accuracy of data queries while minimizing the chances of identifying their records when querying from a statistical database. A random algorithm M is provided, and PM is a set formed by all possible outputs of M. For any two adjacent data sets D and D' and any subset SM of PMs, if the random algorithm M satisfies: pr [ M (D) epsilon SM]<=eε×Pr[M(D')∈SM]Algorithm M is then said to provide epsilon-differential privacy protection, where the parameter epsilon is referred to as the privacy protection budget, which balances the degree and accuracy of privacy protection. ε may be generally predetermined. The closer ε is to 0, eεThe closer to 1, the closer the processing results of the random algorithm to the two neighboring data sets D and D', the stronger the degree of privacy protection.
In this step 304, it amounts to a balance between the compression ratio and the model index. Classical implementations of differential privacy are e.g. laplacian mechanisms, exponential mechanisms, etc. Generally. The laplace mechanism can be used to add noise perturbations to the values, and for the case where the value perturbations are not significant, the exponential mechanism is more suitable. Here, selecting one sub-model from the plurality of sub-models as the target business model may be considered as a case where there is no meaning in numerical disturbance, and may preferably be performed by using an exponential mechanism, since the sub-model is selected, and the sub-model is not processed for the internal structure of the sub-model.
As a specific example, a process of how to select a target business model from sub-models by using the first way of differential privacy is described in detail below, where the first way of differential privacy is an exponential mechanism.
The N sub-models determined in step 303 may be viewed as N entitiesObjects, each entity object corresponding to a value riWherein i can range, for example, from 1 to N, each value riConstituting the output value range R of the query function. It is here intended to select one R from the range RiAnd taking the entity object corresponding to the sub-model as a target business model. Assuming that a given data set (here understood to be a training sample set) is represented by D, under an exponential mechanism, the function q (D, r)i) Referred to as the output value riIs determined.
In combination with each submodel, the usability is closely related to the model index. For example, in the case that the model index includes the compression ratio compared to the initial business model, the accuracy on the test sample set indicates that the sub-model is more desirable since the larger the compression ratio, the smaller the scale of the sub-model, and thus, in one specific example, the availability function may be related to the compression ratio s of the corresponding sub-model iiAccuracy ziAnd (4) positively correlating. Here, the function value of the availability function corresponding to each sub-model may be regarded as the availability coefficient of the corresponding sub-model, for example:
q(D,ri)=si×zi
in other specific examples, the model indicators may include recall, F1 score, and the like, and the usability function may have other reasonable expressions according to the actual model indicators, which are not described herein again.
In the exponential mechanism epsilon-differential privacy, given a privacy cost epsilon (a preset value, e.g., 0.1), given a data set D and a usability function q (D, r), the privacy protection mechanism a (D, q) satisfies epsilon-differential privacy if and only if the following expression holds:
Figure BDA0002566635370000131
wherein, oc represents proportional to. DeltaqMay be a sensitivity factor representing the maximum change value of the availability function resulting from a change of a single datum (a single training sample in the example above). Here, due to accuracy and compression rateAll take values between 0 and 1, so that when a single datum changes, the maximum of q changes to 1, that is to say Δq1 is taken. In other embodiments, q is expressed differently, ΔqMay be determined according to other ways and is not limited herein.
In a specific example, the privacy protection mechanism a may be a mechanism that performs sampling according to a sampling probability, and the sampling probability corresponding to the sub-model i may be denoted as a (D, q)i). For example, the sampling probability of the ith sub-model may be:
Figure BDA0002566635370000141
where j represents any one of the submodels. In this way, an exponential mechanism of differential privacy is introduced into the sampling probability for sampling each sub-model, and sampling can be performed in the value range R (i.e., sampling in each sub-model) according to the sampled sampling probability corresponding to each sub-model.
In sampling, according to a specific example, a number between 0 and 1 may be divided into subintervals that correspond to the number of values (number of submodels) in the range R, and the length of each subinterval corresponds to the sampling probability. When a random number between 0 and 1 is generated by using a preselected random algorithm, a certain numerical value (corresponding to a sub-model) in a value range R corresponding to the interval where the random number is located is used as a sampled target value. The sub-model corresponding to the target value can be used as a target business model. According to another specific example, the value range R is a continuous value interval, and can be divided into sub-intervals with lengths positively correlated to the sampling probabilities of the corresponding submodels according to the sampling probabilities, and values are taken at random directly on the R, and the submodel corresponding to the interval in which the values fall can be used as the target business model.
It can be appreciated that, here, by performing sampling on the sub-models according to the sampling probability through an exponential mechanism in differential privacy, randomness is added to the selection of the target traffic model from the sub-models. Therefore, the specific structure of the sub-model is difficult to be estimated according to the initial business model, so that the target business model is difficult to estimate and realize privacy protection of the target business model and the business data.
It can be understood that, in the process of determining the target business model, each sub-model is subjected to preliminary training to select a suitable sub-model from the training, and the selected suitable sub-model is used as a final sub-model, so that a large amount of calculation caused by deleting a large amount of model parameters after completely training a huge initial business model is avoided. Therefore, the selected target business model can be further trained to be better used for business prediction for given business data, and business prediction results (such as scoring results, classification results, and the like) are obtained.
One training process for the target business model is for example: and inputting each training sample into the selected target business model, and adjusting model parameters according to the comparison between the output result and the sample label.
In general, the comparison of the output result and the sample label may measure the loss by means such as a difference, an absolute value of the difference, or the like in the case where the output result is a numerical value, or by means such as a variance, a euclidean distance, or the like in the case where the output result is a vector or a plurality of numerical values. After the loss is found, the model parameters may be adjusted with the goal of minimizing the loss. Some optimization algorithms can be adopted in the process to accelerate the convergence speed of the model parameters (or the loss functions). For example, an optimization algorithm such as a gradient descent method is used.
According to one possible design, in order to further protect data privacy, a differential privacy method can be introduced by adding interference noise in the loss gradient, and model parameters are adjusted to train a target business model based on privacy protection. At this time, the flow shown in fig. 3 may further include the following steps:
step 305, training the target business model based on the second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for business prediction aiming at given business data. There are many implementation manners of differential privacy, and the differential privacy is introduced here to add noise to data, for example, the differential privacy can be implemented by gaussian noise, laplacian noise, and the like, and is not limited herein.
In one embodiment, for a first sample of an input target business model, model parameters may be adjusted by: firstly, determining the original gradient of loss corresponding to a first batch of samples; then adding noise for realizing differential privacy to the original gradient to obtain a gradient containing the noise; then, the gradient containing noise is used to adjust the model parameters of the target business model. It is understood that the first sample may be a training sample or a plurality of training samples. In the case that the first samples include a plurality of training samples, the loss corresponding to the first samples may be the loss corresponding to the plurality of training samples, the average loss, or the like.
As an example, assume that for the first batch of samples described above, the resulting first raw gradient is:
Figure BDA0002566635370000151
where t represents the current t-th iteration training, xiDenotes the ith sample, g, in the first batch of samplest(xi) Represents the gradient of loss of the ith sample in the t roundtRepresents the model parameter at the start of the t-th round of training, L (θ)t,xi) The loss function corresponding to the ith sample is shown.
As described above, the addition of the noise for implementing the differential privacy to the original gradient may be implemented by means such as laplacian noise, gaussian noise, or the like.
In an embodiment, taking the second way of differential privacy as gaussian noise as an example, the original gradient may be subjected to gradient clipping based on a preset clipping threshold to obtain a clipping gradient, and then, based on the clipping threshold and a predetermined noise scaling coefficient (a preset super parameter), gaussian noise for implementing differential privacy is determined, and then, the clipping gradient and the gaussian noise are fused (e.g., summed) to obtain a gradient including noise. It can be understood that, in the second mode, the original gradient is clipped on one hand, and the clipped gradients are superimposed on the other hand, so that the loss gradient is subjected to the difference privacy processing satisfying gaussian noise.
For example, the original gradient is gradient clipped to:
Figure BDA0002566635370000161
wherein the content of the first and second substances,
Figure BDA0002566635370000162
represents the clipped gradient for the ith sample in round t, C represents the clipping threshold, | g (x)i)‖2Denotes gt(xi) The second order norm of (d). That is, in the case where the gradient is less than or equal to the clipping threshold C, the original gradient is retained, and in the case where the gradient is greater than the clipping threshold C, the original gradient is clipped to a corresponding size in a proportion greater than the clipping threshold C.
Adding gaussian noise to the clipped gradient to obtain a gradient containing noise, for example:
Figure BDA0002566635370000163
wherein N represents the number of samples contained in the first batch of samples,
Figure BDA0002566635370000164
representing the gradients containing noise corresponding to the N samples in the t round;
Figure BDA0002566635370000165
representing the probability density coincidence with 0 as mean, σ2C2I is Gaussian noise of a Gaussian distribution of the variance; sigma represents the noise scaling coefficient, is a preset hyper parameter and can be set as required; c is the clipping threshold; i denotes an indicator function and may take 0 or 1, for example, it may be set that an even round in a plurality of rounds of training takes 1 and an odd round takes 0. In the above equation, when the first samples include a plurality of training samples, the gradient including noise is the pairGaussian noise is superimposed on the average clipped gradient after the original gradient clipping of a plurality of training samples. When the first batch of samples only contains one training sample, the gradient containing noise in the above formula is the original gradient of the training sample which is clipped and then added with Gaussian noise.
Then, using the gradient after gaussian noise addition, still with the goal of minimizing the loss corresponding to the sample i, the model parameters can be adjusted as follows:
Figure BDA0002566635370000171
wherein eta istThe learning step length or learning rate of the t-th round is a preset hyper-parameter, such as 0.5, 0.3, etc.; thetat+1The adjusted model parameters obtained from the training of the t-th round (containing the first batch of samples) are shown. And under the condition that the difference privacy is met by gradient-added Gaussian noise, the adjustment of the model parameters meets the difference privacy.
Accordingly, after multiple rounds of iterative training, a target business model based on differential privacy can be obtained. Because Gaussian noise is added in the model training process, the model structure is difficult to be inferred from the data presented by the target business model or business data is difficult to be inferred, and therefore the effectiveness of privacy data protection can be further improved.
The trained target business model can be used for carrying out corresponding business prediction aiming at given business data. The business data is the business data consistent with the type of the training sample, such as financial related data of the user, and the user loan risk prediction can be carried out through a target business model
Reviewing the above process, the method for determining a target business model based on privacy protection provided in the embodiments of the present specification performs initial training on a selected complex business model to obtain an initial business model, then prunes the initial business model, and trains the pruned business model under the condition that parameters are reset back to an initialization state to check whether the pruned model parameters are unimportant from beginning. And selecting a target business model from the obtained multiple sub-models in a differential privacy mode. Therefore, a compression model with privacy protection can be obtained, and privacy protection is provided for the model on the basis of realizing model compression.
According to an embodiment of another aspect, an apparatus for determining a target business model based on privacy protection is also provided. The business model here may be a model for conducting business processing such as classification, scoring, etc. for given business data. The service data may be various types of data such as text, image, voice, video, animation, etc. The apparatus may be provided in a system, device, apparatus, platform, or server having some computing power.
FIG. 5 illustrates a schematic block diagram of an apparatus for determining a target business model based on privacy protection, according to one embodiment. As shown in fig. 5, the apparatus 500 includes:
an initialization unit 51 configured to determine initial values corresponding to respective model parameters for the selected service model according to a predetermined manner, thereby initializing the selected service model;
an initial training unit 52 configured to train the initialized selected service model using a plurality of training samples until the model parameters converge, so as to obtain an initial service model;
a pruning unit 53 configured to determine a plurality of sub models of the initial business model based on pruning of the initial business model, wherein each sub model corresponds to the model parameters and the model indexes determined by the re-training of the initialization unit 51 and the initial training unit 52: the initializing unit 51 resets the model parameters of the trimmed service model to the initial values of the corresponding model parameters in the initialized service model; the initial training unit 52 sequentially inputs a plurality of training samples into the trimmed service model, and adjusts the model parameters based on the comparison between the corresponding sample labels and the output results of the trimmed service model;
the determining unit 54 is configured to select a target business model from each sub-model by using a first mode of differential privacy based on the model index corresponding to each sub-model.
According to an embodiment, the trimming unit 53 may further be configured to:
pruning the initial business model according to the model parameters of the initial business model to obtain a first pruning model;
taking the first pruning model corresponding to the model parameters obtained through retraining as a first sub-model;
and iteratively trimming the first submodel to obtain a subsequent submodel until the end condition is met.
In one embodiment, the ending condition may include at least one of the number of iterations reaching a predetermined number, the number of submodels reaching a predetermined number, the size of the last submodel being less than a set size threshold, and the like.
In an alternative implementation, the model is pruned by the pruning unit 53 in an order from small to large according to one of the following modes: pruning away a predetermined proportion of model parameters, pruning away a predetermined number of model parameters, pruning to obtain models that do not exceed a predetermined size, and so forth.
According to one possible design, the first way of differential privacy is an exponential mechanism, and the determining unit 54 may be further configured to:
determining each availability coefficient corresponding to each submodel according to the model index corresponding to each submodel;
determining each sampling probability corresponding to each sub-model by using an index mechanism according to each availability coefficient;
and sampling in a plurality of sub models according to each sampling probability, and taking the sampled sub models as target business models.
In one embodiment, the apparatus 500 may further comprise a privacy training unit 55 configured to:
and training the target business model based on a second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for carrying out business prediction for protecting data privacy aiming at given business data.
In a further embodiment, the plurality of training samples includes a first batch of samples, where a sample i in the first batch of samples corresponds to a loss obtained after the processing of the target business model, and the privacy training unit 55 is further configured to:
determining an original gradient of the loss corresponding to the sample i;
adding noise on the original gradient by using a second mode of differential privacy to obtain a gradient containing the noise;
and adjusting the model parameters of the target business model by using the gradient containing the noise and taking the loss corresponding to the minimized sample i as a target.
In a further embodiment, the second way of differentiating privacy is to add gaussian noise, and the privacy training unit 55 may be further configured to:
based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient;
determining Gaussian noise for realizing differential privacy by utilizing Gaussian distribution determined based on a clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;
and superposing the Gaussian noise and the clipping gradient to obtain a gradient containing the noise.
It should be noted that the apparatus 500 shown in fig. 5 is an apparatus embodiment corresponding to the method embodiment shown in fig. 3, and the corresponding description in the method embodiment shown in fig. 3 is also applicable to the apparatus 500, and is not repeated herein.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 3.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims (13)

1. A method for determining a target business model based on privacy protection, wherein the target business model is used for processing given business data to obtain a corresponding business prediction result; the method comprises the following steps:
training the selected business model by using a plurality of training samples until the model parameters are converged to obtain an initial business model;
determining a plurality of sub-models of the initial business model based on pruning of the initial business model, wherein each sub-model is respectively corresponding to a model parameter and a model index determined through retraining, and the model index is used for evaluating the model performance under the condition of convergence of the corresponding sub-models;
determining each sampling probability corresponding to each submodel based on each model index corresponding to each submodel by using an index mechanism of differential privacy;
and sampling each sub-model according to each sampling probability so as to select a target service model.
2. The method of claim 1, wherein the determining a plurality of sub-models of the initial business model based on the pruning of the initial business model comprises:
pruning the initial business model according to the model parameters of the initial business model to obtain a first pruning model;
taking the first pruning model corresponding to the model parameters obtained through retraining as a first sub-model;
and iteratively trimming the first submodel to obtain a subsequent submodel until an end condition is met.
3. The method of claim 2, wherein the end condition comprises at least one of the number of iterations reaching a predetermined number, the number of submodels reaching a predetermined number, and the size of the last submodel being less than a set size threshold.
4. The method according to claim 1 or 2, wherein the pruning of the initial traffic model is performed in order of small to large model parameters based on one of: and trimming off model parameters in a preset proportion, trimming off a preset number of model parameters, and trimming to obtain a model with the scale not exceeding a preset size.
5. The method of claim 1, wherein the determining, by using an exponential mechanism of differential privacy, respective sampling probabilities respectively corresponding to the respective submodels based on the respective model indicators corresponding to the respective submodels comprises:
determining each availability coefficient corresponding to each submodel according to the model index corresponding to each submodel;
and determining each sampling probability corresponding to each sub-model by using an index mechanism according to each availability coefficient.
6. The method of claim 5, wherein the model metrics include a compression ratio relative to the initial traffic model and at least one of: accuracy, recall, F1 score, the usability factor being the product of the compression ratio and other terms contained in the model index.
7. The method of claim 1, wherein the method further comprises:
and training the target business model based on a second mode of differential privacy by using a plurality of training samples, so that the trained target business model is used for carrying out business prediction for protecting data privacy aiming at given business data.
8. The method of claim 7, wherein the plurality of training samples comprise a first plurality of samples, wherein a sample i in the first plurality of samples corresponds to a loss obtained after the target business model is processed, and wherein training the target business model based on the second way of differential privacy using the plurality of training samples comprises:
determining an original gradient of a loss corresponding to the sample i;
adding noise on the original gradient by utilizing the second mode of differential privacy to obtain a gradient containing the noise;
and adjusting the model parameters of the target business model by using the gradient containing the noise and taking the minimum loss corresponding to the sample i as a target.
9. The method of claim 8, wherein the second way of differential privacy is to add gaussian noise, and the adding noise to the original gradient by the second way of differential privacy to obtain a gradient containing noise comprises:
based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient;
determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;
and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.
10. The method of claim 1, wherein the traffic data comprises at least one of pictures, audio, characters.
11. A device for determining a target business model based on privacy protection is disclosed, wherein the target business model is used for processing given business data to obtain a corresponding business prediction result; the device comprises:
an initial training unit configured to train the initialized selected service model using a plurality of training samples until model parameters converge to obtain an initial service model;
the pruning unit is configured to determine a plurality of sub models of the initial business model based on pruning of the initial business model, wherein each sub model respectively corresponds to a model parameter and a model index determined by retraining through the initial training unit, and the model index is used for describing the model performance of the corresponding sub model;
the determining unit is configured to determine each sampling probability corresponding to each submodel based on the model index corresponding to each submodel by using an index mechanism of differential privacy; and
and sampling each sub-model according to each sampling probability so as to select a target service model.
12. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-10.
13. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.
CN202010626329.4A 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection Pending CN113515770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010626329.4A CN113515770A (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010276685.8A CN111177792B (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection
CN202010626329.4A CN113515770A (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202010276685.8A Division CN111177792B (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection

Publications (1)

Publication Number Publication Date
CN113515770A true CN113515770A (en) 2021-10-19

Family

ID=70655223

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010626329.4A Pending CN113515770A (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection
CN202010276685.8A Active CN111177792B (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202010276685.8A Active CN111177792B (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection

Country Status (3)

Country Link
CN (2) CN113515770A (en)
TW (1) TWI769754B (en)
WO (1) WO2021204272A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081024A (en) * 2022-08-16 2022-09-20 杭州金智塔科技有限公司 Decentralized business model training method and device based on privacy protection
CN117056979A (en) * 2023-10-11 2023-11-14 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515770A (en) * 2020-04-10 2021-10-19 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection
CN111368337B (en) * 2020-05-27 2020-09-08 支付宝(杭州)信息技术有限公司 Sample generation model construction and simulation sample generation method and device for protecting privacy
CN111475852B (en) * 2020-06-19 2020-09-15 支付宝(杭州)信息技术有限公司 Method and device for preprocessing data aiming at business model based on privacy protection
CN112214791B (en) * 2020-09-24 2023-04-18 广州大学 Privacy policy optimization method and system based on reinforcement learning and readable storage medium
CN114936650A (en) * 2020-12-06 2022-08-23 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model based on privacy protection
CN112561076B (en) * 2020-12-10 2022-09-20 支付宝(杭州)信息技术有限公司 Model processing method and device
CN112632607B (en) * 2020-12-22 2024-04-26 中国建设银行股份有限公司 Data processing method, device and equipment
CN112926090B (en) * 2021-03-25 2023-10-27 支付宝(杭州)信息技术有限公司 Business analysis method and device based on differential privacy
US20220318412A1 (en) * 2021-04-06 2022-10-06 Qualcomm Incorporated Privacy-aware pruning in machine learning
CN113221717B (en) * 2021-05-06 2023-07-18 支付宝(杭州)信息技术有限公司 Model construction method, device and equipment based on privacy protection
CN113420322B (en) * 2021-05-24 2023-09-01 阿里巴巴新加坡控股有限公司 Model training and desensitizing method and device, electronic equipment and storage medium
CN113268772B (en) * 2021-06-08 2022-12-20 北京邮电大学 Joint learning security aggregation method and device based on differential privacy
CN113486402A (en) * 2021-07-27 2021-10-08 平安国际智慧城市科技股份有限公司 Numerical data query method, device, equipment and storage medium
CN113923476B (en) * 2021-09-30 2024-03-26 支付宝(杭州)信息技术有限公司 Video compression method and device based on privacy protection
CN114185619B (en) * 2021-12-14 2024-04-05 平安付科技服务有限公司 Breakpoint compensation method, device, equipment and medium based on distributed operation
CN114338552B (en) * 2021-12-31 2023-07-07 河南信大网御科技有限公司 System for determining delay mimicry
CN114780999B (en) * 2022-06-21 2022-09-27 广州中平智能科技有限公司 Deep learning data privacy protection method, system, equipment and medium
CN116432039B (en) * 2023-06-13 2023-09-05 支付宝(杭州)信息技术有限公司 Collaborative training method and device, business prediction method and device
CN116805082B (en) * 2023-08-23 2023-11-03 南京大学 Splitting learning method for protecting private data of client

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657498A (en) * 2018-12-28 2019-04-19 广西师范大学 The difference method for secret protection that top-k Symbiotic Model excavates in a plurality of stream
US20190138743A1 (en) * 2015-11-02 2019-05-09 LeapYear Technologies, Inc. Differentially Private Processing and Database Storage
CN110874488A (en) * 2019-11-15 2020-03-10 哈尔滨工业大学(深圳) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368752B (en) * 2017-07-25 2019-06-28 北京工商大学 A kind of depth difference method for secret protection based on production confrontation network
US11068605B2 (en) * 2018-06-11 2021-07-20 Grey Market Labs, PBC Systems and methods for controlling data exposure using artificial-intelligence-based periodic modeling
US11341281B2 (en) * 2018-09-14 2022-05-24 International Business Machines Corporation Providing differential privacy in an untrusted environment
US11556846B2 (en) * 2018-10-03 2023-01-17 Cerebri AI Inc. Collaborative multi-parties/multi-sources machine learning for affinity assessment, performance scoring, and recommendation making
CN110084365B (en) * 2019-03-13 2023-08-11 西安电子科技大学 Service providing system and method based on deep learning
CN110719158B (en) * 2019-09-11 2021-11-23 南京航空航天大学 Edge calculation privacy protection system and method based on joint learning
CN113515770A (en) * 2020-04-10 2021-10-19 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190138743A1 (en) * 2015-11-02 2019-05-09 LeapYear Technologies, Inc. Differentially Private Processing and Database Storage
CN109657498A (en) * 2018-12-28 2019-04-19 广西师范大学 The difference method for secret protection that top-k Symbiotic Model excavates in a plurality of stream
CN110874488A (en) * 2019-11-15 2020-03-10 哈尔滨工业大学(深圳) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯登国;张敏;叶宇桐;: "基于差分隐私模型的位置轨迹发布技术研究", 电子与信息学报, no. 01, 15 January 2020 (2020-01-15) *
张伟;仓基云;王旭然;陈云芳;: "基于层次随机图的社会网络差分隐私数据发布", 南京邮电大学学报(自然科学版), no. 03, 29 June 2016 (2016-06-29) *
徐嘉荟;: "基于模型剪枝的神经网络压缩技术研究", 信息通信, no. 12, 15 December 2019 (2019-12-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081024A (en) * 2022-08-16 2022-09-20 杭州金智塔科技有限公司 Decentralized business model training method and device based on privacy protection
CN117056979A (en) * 2023-10-11 2023-11-14 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data
CN117056979B (en) * 2023-10-11 2024-03-29 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data

Also Published As

Publication number Publication date
CN111177792A (en) 2020-05-19
WO2021204272A1 (en) 2021-10-14
CN111177792B (en) 2020-06-30
TWI769754B (en) 2022-07-01
TW202139045A (en) 2021-10-16

Similar Documents

Publication Publication Date Title
CN111177792B (en) Method and device for determining target business model based on privacy protection
CN111124840B (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
EP3340129B1 (en) Artificial neural network class-based pruning
US6397200B1 (en) Data reduction system for improving classifier performance
CN108154237B (en) Data processing system and method
CN113536383B (en) Method and device for training graph neural network based on privacy protection
CN113220886A (en) Text classification method, text classification model training method and related equipment
CN114417427A (en) Deep learning-oriented data sensitivity attribute desensitization system and method
CN113822315A (en) Attribute graph processing method and device, electronic equipment and readable storage medium
CN115062779A (en) Event prediction method and device based on dynamic knowledge graph
CN113591924A (en) Phishing number detection method, system, storage medium and terminal equipment
CN112380919A (en) Vehicle category statistical method
CN116010832A (en) Federal clustering method, federal clustering device, central server, federal clustering system and electronic equipment
CN115907775A (en) Personal credit assessment rating method based on deep learning and application thereof
CN115564155A (en) Distributed wind turbine generator power prediction method and related equipment
CN115131646A (en) Deep network model compression method based on discrete coefficient
Yoo et al. Unpriortized autoencoder for image generation
CN115461740A (en) Behavior control method and device and storage medium
CN117058493B (en) Image recognition security defense method and device and computer equipment
CN117236900B (en) Individual tax data processing method and system based on flow automation
CN117807237B (en) Paper classification method, device, equipment and medium based on multivariate data fusion
Kutschenreiter-Praszkiewicz Decision Rule Induction Based on the Graph Theory
Häggström Latent Data-Structures for Complex State Representation: A Steppingstone to Generating Synthetic 5G RAN data using Deep Learning
CN117077813A (en) Training method and training system for machine learning model
CN117115608A (en) Decision method, device, equipment and medium based on knowledge embedding reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40062502

Country of ref document: HK