WO2023216902A1 - 用于模型性能评估的方法、装置、设备和介质 - Google Patents

用于模型性能评估的方法、装置、设备和介质 Download PDF

Info

Publication number
WO2023216902A1
WO2023216902A1 PCT/CN2023/091189 CN2023091189W WO2023216902A1 WO 2023216902 A1 WO2023216902 A1 WO 2023216902A1 CN 2023091189 W CN2023091189 W CN 2023091189W WO 2023216902 A1 WO2023216902 A1 WO 2023216902A1
Authority
WO
WIPO (PCT)
Prior art keywords
metric
value
prediction
values
perturbation
Prior art date
Application number
PCT/CN2023/091189
Other languages
English (en)
French (fr)
Inventor
孙建凯
杨鑫
王崇
解浚源
吴迪
Original Assignee
北京字节跳动网络技术有限公司
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司, 脸萌有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023216902A1 publication Critical patent/WO2023216902A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Example embodiments of the present disclosure relate generally to the field of computers, and in particular to methods, apparatus, devices and computer-readable storage media for model performance evaluation.
  • Federated learning refers to using the data of each node to achieve joint modeling and improve the effect of machine learning models on the basis of ensuring data privacy and security. Federated learning can allow each node to not leave the end to achieve data protection purposes.
  • a scheme for model performance evaluation is provided.
  • a method for model performance evaluation includes: applying multiple data samples to the prediction model respectively at the client node to obtain multiple prediction scores output by the prediction model, and the multiple prediction scores respectively indicate multiple numbers.
  • the predicted probability that the data sample belongs to the first category or the second category based on multiple true value labels and multiple prediction scores of multiple data samples, determine the values of multiple measurement parameters related to the predetermined performance indicators of the prediction model; for multiple Applying disturbances to the values of a metric parameter to obtain the perturbation values of multiple metric parameters; and sending the perturbation values of the multiple metric parameters to the service node.
  • a method for model performance evaluation includes: at the service node, receiving perturbation values of multiple metric parameters related to predetermined performance indicators of the prediction model from multiple client nodes respectively; aggregating the multiple metric parameters from the multiple client nodes by the metric parameter. perturbation values to obtain aggregate values of multiple metric parameters; and determine values of predetermined performance indicators based on aggregate values of multiple metric parameters.
  • an apparatus for model performance evaluation includes: a prediction module configured to apply a plurality of data samples to a prediction model respectively to obtain a plurality of prediction scores output by the prediction model, and the plurality of prediction scores respectively indicate that the plurality of data samples belong to the first category or the second category.
  • the prediction probability ;
  • the metric determination module is configured to determine the values of multiple metric parameters related to predetermined performance indicators of the prediction model based on the multiple ground truth labels and the multiple prediction scores of the multiple data samples;
  • the perturbation module is configured In order to apply perturbation to the values of the multiple metric parameters, the perturbation values of the multiple metric parameters are obtained; and the sending module is configured to send the perturbation values of the multiple metric parameters to the service node.
  • an apparatus for model performance evaluation includes: a receiving module configured to receive perturbation values of multiple metric parameters related to predetermined performance indicators of the prediction model from multiple client nodes respectively; an aggregation module configured to aggregate the data from multiple clients by metric parameters The perturbation values of the multiple metric parameters of the node obtain an aggregate value of the multiple metric parameters; and the performance determination module is configured to determine the value of the predetermined performance index based on the aggregate value of the multiple metric parameters.
  • an electronic device in a fifth aspect of the present disclosure, includes at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit.
  • the instructions when executed by at least one processing unit, cause the device to perform the method of the first aspect.
  • an electronic device in a sixth aspect of the present disclosure, includes at least one processing unit; and at least one memory coupled to at least one processing units and stores instructions for execution by at least one processing unit.
  • the instructions when executed by at least one processing unit, cause the device to perform the method of the second aspect.
  • a computer-readable storage medium is provided.
  • a computer program is stored on the medium, and the computer program is executed by the processor to implement the method of the first aspect.
  • a computer-readable storage medium is provided.
  • a computer program is stored on the medium, and the computer program is executed by the processor to implement the method of the second aspect.
  • Figure 1 shows a schematic diagram of an example environment in which embodiments of the present disclosure can be applied
  • Figure 2 illustrates a flow diagram of signaling flow for model performance evaluation according to some embodiments of the present disclosure
  • 3A illustrates a flowchart of a process of determining a value of a metric parameter in accordance with some embodiments of the present disclosure
  • 3B illustrates a flow of signaling flows for determining values of metric parameters according to some embodiments of the present disclosure
  • FIG. 4 illustrates a flowchart of a process for model performance evaluation at a service node in accordance with some embodiments of the present disclosure
  • Figure 5 illustrates a flowchart of a process for model performance evaluation at a service node, in accordance with some embodiments of the present disclosure
  • FIG. 6 illustrates a block diagram of an apparatus for model performance evaluation at a client node in accordance with some embodiments of the present disclosure
  • FIG. 7 illustrates a block diagram of an apparatus for model performance evaluation at a service node in accordance with some embodiments of the present disclosure.
  • FIG. 8 illustrates a block diagram of a computing device/system capable of implementing one or more embodiments of the present disclosure.
  • a prompt message is sent to the user to clearly remind the user that the operation requested will require the acquisition and use of the user's personal information. Therefore, users can autonomously choose whether to provide personal information to software or hardware such as electronic devices, applications, servers or storage media that perform the operations of the technical solution of the present disclosure based on the prompt information.
  • the method of sending prompt information to the user can be, for example, a pop-up window, and the prompt information can be presented in the form of text in the pop-up window.
  • the pop-up window can also host a selection control for the user to choose "agree” or "disagree” to provide personal information to the electronic device.
  • model can learn the association between the corresponding input and output from the training data, so that the corresponding output can be generated for the given input after the training is completed. Model generation can be based on machine learning techniques. Deep learning is a machine learning algorithm that uses multiple layers of processing units to process inputs and provide corresponding outputs. Neural network models are an example of deep learning-based models. In this article, a “model” may also be called a “machine learning model,” “learning model,” “machine learning network,” or “learning network,” and these terms are used interchangeably in this article.
  • a "neural network” is a machine learning network based on deep learning. Neural networks are capable of processing inputs and providing corresponding outputs, and typically include an input layer and an output layer and one or more hidden layers between the input layer and the output layer. Neural networks used in deep learning applications often include many hidden layers, thereby increasing the depth of the network.
  • the layers of a neural network are connected in sequence such that the output of the previous layer is provided as the input of the subsequent layer, where the input layer receives the input of the neural network and the output of the output layer serves as the final output of the neural network.
  • Each layer of a neural network consists of one or more nodes (also called processing nodes or neurons), each processing input from the previous layer.
  • machine learning can roughly include three stages, namely the training stage, the testing stage and the application stage (also called the inference stage).
  • the training phase a given model can be trained using a large amount of training data, and parameter values are updated iteratively until the model can obtain consistent inferences from the training data that meet the expected goals.
  • the model can be thought of as being able to learn the association between inputs and outputs (also known as input-to-output mapping) from the training data.
  • the parameter values of the trained model are determined.
  • test inputs are applied to the trained model to test whether the model can provide the correct output, thereby determining the performance of the model.
  • the model can be used to process the actual input and determine the corresponding output based on the parameter values obtained through training.
  • FIG. 1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented.
  • Client nodes 110-1...110-k,...110-N can maintain respective local data sets 112-1...112-k,...112-N respectively.
  • client nodes 110-1...110-k,...110-N may be collectively or individually referred to as client nodes 110
  • local data sets 112-1...112-k,...112- N may be referred to collectively or individually as local data set 112 .
  • the client node 110 and/or the service node 120 may be implemented at a terminal device or a server.
  • the terminal device can be any type of mobile terminal, fixed terminal or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, personal communication system (PCS) devices , personal navigation device, personal digital assistant (PDA), audio/video player, digital camera/camcorder, positioning device, television receiver, radio receiver, e-book device, gaming device, or any combination of the foregoing, including Accessories and peripherals for these devices or any combination thereof.
  • the terminal device is also able to support any type of interface to the user (such as "wearable" circuitry, etc.).
  • Servers are various types of computing systems/servers capable of providing computing capabilities, including but not limited to mainframes, edge computing nodes, computing devices in cloud environments, and so on.
  • a client node refers to a node that provides some of the data used to train, validate, or evaluate a predictive model.
  • the client node may also be called a client, a terminal node, a terminal device, a user device, etc.
  • a service node refers to a node that aggregates results at client nodes.
  • N client nodes 110 jointly participate in training the prediction model 125 and aggregate the intermediate results in the training to the service node 120 so that the service node 120 updates the parameter set of the prediction model 125 .
  • the complete set of local data for these client nodes 110 constitutes the complete training data set for the predictive model 125 . Therefore, according to the mechanism of federated learning, the service node 120 can determine the global prediction model 125.
  • the local data set 112 at the client node 110 may include data samples and ground truth labels.
  • Figure 1 specifically illustrates a local data set 112-k at a client node 110-k, which includes a set of data samples and a set of ground truth labels.
  • the data sample set includes many (M) data samples 102-1, 102-i, ... 102-M (collectively or individually referred to as data samples 102), and the ground truth label set includes corresponding multiple (M) ground truth labels ( ground-truth label) 105-1, 105-i, ... 105-M (collectively or individually referred to as truth labels 105).
  • Each data sample 102 may be annotated with a corresponding ground truth label 105 .
  • the data sample 102 may correspond to the input of the predictive model 125 and the ground truth label 105 indicates the true output of the data sample 102 .
  • Ground truth labels are an important part of supervised machine learning.
  • the prediction model 125 may be built based on various machine learning or deep learning model architectures, and may be configured to implement various prediction tasks, such as various classification tasks, recommendation tasks, and so on.
  • the prediction model 125 may also be called a recommendation model, a classification model, etc.
  • Data samples 102 may include input information related to a specific task of the predictive model 125, and ground truth labels 105 related to the desired output of the task.
  • the prediction model 125 may be configured to predict whether the input data sample belongs to the first category or the second category, and the ground truth label is used to label whether the data sample actually belongs to the first category or the second category. category.
  • Many practical applications can be classified as such two-category tasks, such as whether the recommended items are converted (for example, clicks, purchases, registrations or other demand behaviors) in the recommendation task, etc.
  • Figure 1 only shows an example federated learning environment. Depending on the federated learning algorithm and actual application needs, the environment can also be different.
  • the service node 120 may serve as a client node in addition to serving as a central node to provide partial data for model training, model performance evaluation, etc. Embodiments of the present disclosure are not limited in this respect.
  • the client node 110 does not need to disclose local data samples or label data, but sends gradient data calculated based on local training data to the service node 120 for the service node 120 to update the parameter set of the prediction model 125 .
  • Model performance The evaluation also requires data, including data samples required for model input and label data corresponding to the data samples.
  • the performance of a predictive model can be measured by one or more performance metrics. Different performance indicators can measure the difference between the predicted output given by the prediction model for the data sample set and the real output indicated by the ground-truth label set from different perspectives. Generally, if the difference between the predicted output given by the prediction model and the real output is smaller, it means that the performance of the prediction model is better. It can be seen that it is usually necessary to determine the performance indicators of the prediction model based on the set of ground-truth labels of the data samples.
  • a model performance evaluation solution which can protect label data local to a client node.
  • perturbations are applied to the determined values of the metric parameters to obtain perturbation values of the multiple metric parameters.
  • the client node sends the perturbation value of the metric parameter to the service node. Since there is no need to directly send the true values of the metric parameters, it is difficult for the observer to deduce the true value label of the data sample from the perturbation value. In this way, data leakage can be effectively avoided.
  • the service node receives perturbation values of a plurality of metric parameters determined by them respectively from a plurality of client nodes.
  • the service node aggregates perturbation values of multiple metric parameters from multiple client nodes by metric parameter. After aggregating the aggregation of perturbation values from multiple different sources, the perturbations are canceled out. Therefore, based on the aggregated values of multiple measurement parameters, the service node is able to accurately determine the value of the model's performance indicator.
  • each client node does not need to expose a local set of truth labels or parameter values determined based on the truth labels, and the service node is also allowed to calculate the performance index value of the model. In this way, while achieving model performance evaluation, the purpose of privacy protection for the local label data of the client node is achieved.
  • FIG. 2 illustrates a schematic block diagram of signaling flow 200 for model performance evaluation in accordance with some embodiments of the present disclosure. For ease of discussion, reference is made to environment 100 of FIG. 1 .
  • Signaling flow 200 involves client node 110 and service node 120.
  • the prediction model 125 to be evaluated may be a global prediction model determined based on a training process of federated learning, for example, the client node 110 and the service node 120 participate in the training process of the prediction model 125 .
  • the prediction model 125 may also be a model obtained in any other manner, and the client node 110 and the service node 120 may not participate in the training process of the prediction model 125.
  • the scope of the present disclosure is not limited in this regard.
  • service node 120 sends 205 predictive model 125 to N client nodes 110.
  • each client node 110 may perform a subsequent evaluation process based on the prediction model 125 .
  • the predictive model 125 to be evaluated may also be provided to the client node 110 in any other suitable manner.
  • the operation of the client test is described from the perspective of a single client node 110 .
  • Multiple client nodes 110 may operate similarly.
  • the client node 110 applies 215 the model 125 to multiple data samples respectively to obtain multiple prediction scores output by the prediction model 125.
  • Each prediction score may indicate a predicted probability that the corresponding data sample 102 belongs to the first category or the second category. These two categories can be configured according to actual task needs.
  • the value range of the prediction score output by the prediction model 125 can be set arbitrarily.
  • the prediction score can be a value in a certain continuous value interval (for example, a value between 0 and 1), or it can be one of multiple discrete values (for example, it can be 0, One of discrete values such as 1, 2, 3, 4, 5).
  • a higher prediction score may indicate that the data sample 102 has a greater predicted probability of belonging to the first category and a smaller predicted probability of belonging to the second category.
  • the opposite setting is also possible, for example, the higher the prediction score It may be indicated that the greater the predicted probability that the data sample 102 belongs to the second category, the smaller the predicted probability that it belongs to the first category.
  • the client node 110 determines 220 a plurality of predetermined performance indicators related to the prediction model 125 based on a plurality of ground truth labels (also referred to as ground truth labels) of the plurality of data samples 102 and a plurality of prediction scores output by the model.
  • the value of the metric parameter is a plurality of ground truth labels (also referred to as ground truth labels) of the plurality of data samples 102 and a plurality of prediction scores output by the model.
  • the truth label 105 is used to label whether the corresponding data sample 102 belongs to the first category or the second category.
  • data samples belonging to the first category are sometimes called positive samples, positive examples, or positive class samples
  • data samples belonging to the second category are sometimes called negative samples, negative examples, or negative class samples.
  • each truth label 105 may have one of two values, indicating the first category or the second category respectively.
  • the value of the true value label 105 corresponding to the first category may be set to “1”, which indicates that the data sample belongs to the first category and is a positive sample.
  • the value of the ground truth label 105 corresponding to the second category can be set to “0”, which indicates that the data sample belongs to the second category and is a negative sample.
  • individual client nodes 110 determine metric information related to performance indicators of the model based on local data sets (data samples and ground truth labels). By aggregating the metric information of multiple client nodes 110 to the service node 120, the performance of the prediction model 125 can be evaluated based on the complete data set of multiple client nodes.
  • Metric information refers to the information that needs to be concerned when calculating the performance indicators of the model, and can usually be indicated by multiple metric parameters.
  • the values of these measurement parameters need to be calculated based on the output results of the data samples after passing through the model (i.e., prediction scores), and the corresponding true value labels of the data samples.
  • the type of metric information provided by the client node may depend on the specific performance metrics being calculated.
  • the prediction score output by the prediction model 125 for a certain data sample is usually compared with a certain score threshold, and based on the comparison result, it is determined whether the data sample is predicted to belong to the first category or the second category.
  • the prediction of the prediction model 125 used to implement the binary classification task may have four possible outcomes.
  • the prediction model 125 predicts that it is a positive sample, then the data sample is considered to be a true sample (True Positive, TP ). If the true value label 105 indicates that it belongs to the first category (positive sample) and the prediction model 125 predicts that it is a negative sample, then the data sample is considered to be a false negative sample (False Negative, FN). If the true value label 105 indicates that it belongs to the second category (negative sample), but the prediction model 125 also predicts that it is a negative sample, then the data sample is considered to be a True Negative (TN).
  • TN True Negative
  • the performance index can be calculated based on the prediction results of the complete set of data samples of multiple client nodes 110 and the complete set of ground truth labels.
  • performance indicators of the prediction model 125 may also include false positive ratio (FPR) and/or false positive ratio (FPR).
  • the performance metric of the predictive model 125 may include the area under the curve (AUC) of the receiver operating characteristic curve (ROC).
  • the ROC curve is a curve drawn on the coordinate axis based on different classification methods (setting different score thresholds), with the false positive sample ratio (FPR) as the X-axis and the true sample ratio (TPR) as the Y-axis. Based on each possible score threshold, multiple (FPR, TPR) pair of coordinate points, and connecting these points into a line becomes the ROC curve of a specific model.
  • AUC refers to the area under the ROC curve.
  • AUC can be calculated by calculating the area under the ROC curve with an approximate algorithm.
  • the AUC may also be determined from a probabilistic perspective.
  • AUC can be thought of as: randomly selecting a positive sample and a negative sample, the probability that the prediction model gives the positive sample a higher prediction score than the negative sample. That is to say, in the data sample set, positive and negative samples meet in pairs to form a pair of positive and negative samples, in which the prediction score of the positive sample is greater than the prediction score of the negative sample. If the model can give more positive samples a higher prediction score than the negative samples, it can be considered that the AUC is higher and the model has better performance.
  • the value range of AUC is between 0.5 and 1. The closer the AUC is to 1, the better the performance of the model.
  • the performance indicators of the prediction model 125 may also include a P-R curve with recall on the horizontal axis and precision on the vertical axis. The closer the P-R curve is to the upper right corner, the better the performance of the model. The area under the curve is called the AP score (Average Precision Score).
  • AUC can have different calculation methods, and the measurement parameters required under different calculation methods are also different.
  • Figure 3A illustrates a flow diagram of a process 300 for determining a value of a metric parameter in accordance with some embodiments of the present disclosure.
  • Process 300 is used to determine the values of metric parameters required in a calculation of AUC.
  • Process 300 may be implemented at client node 110.
  • the client node 110 determines the number of first-type labels (referred to as the "first number”) among the plurality of ground-truth labels 105, where the first-type labels indicate that the corresponding data sample 102 belongs to the first category, e.g. Indicates that data sample 102 is a positive sample.
  • Client node 110 may also determine the number of second-type labels (referred to as the “second number”) among the plurality of truth-value labels 105, where the second-type labels indicate that the corresponding data sample 102 belongs to the Category two, for example, indicates that data sample 102 is a negative sample.
  • the client node 110 may also determine the data sample 102 corresponding to the first type of label based on the ranking results of the multiple prediction scores among the prediction scores of all client nodes (ie, , the number of prediction scores (called the third number) that the prediction score of the positive sample exceeds the prediction score in the prediction score set.
  • the third number can be used as the value of another measurement parameter of the performance indicator. This number may indicate the number of sample pairs in the total set of data samples 102 in which the positive samples are ranked higher than the remaining samples (in the case of ascending order).
  • FIG. 3B illustrates a flow diagram of a signaling flow 350 for determining a value of a metric parameter in accordance with some embodiments of the present disclosure.
  • the client node 110 sends 352 the prediction score output by the prediction model 125 to the service node 120 for sorting.
  • the client node 110 may randomly adjust the order of the multiple predicted scores and send the multiple predicted scores to the service node in the adjusted order.
  • the output prediction scores have a certain order, such as from large to small or small to large. sequence, which may result in certain information leakage. Random sequence adjustment can further enhance data privacy protection.
  • the service node 120 After receiving 354 the prediction scores, the service node 120 sorts 356 the prediction score sets from the plurality of client nodes 110, thereby obtaining the ranking result of the prediction scores from each client node 110 in the prediction score set.
  • the service node 120 may sort the set of predicted scores in ascending order and assign (Predicted score of i-th data sample 102 of client node 110-k) Assign ranking value
  • the sorting value Can indicate predicted score The number of other predicted scores exceeded in the set of predicted scores. For example, in ascending order, the lowest predicted score is assigned a rank value of 0, indicating that it does not exceed (larger than) any other predicted score; the next predicted score is assigned a rank value of 1, indicating that it is greater than 1 predicted score in the set, to And so on. Such assignment of sorted values facilitates subsequent calculations.
  • the service node 120 sends 358 the sorting results of its multiple predicted scores in the overall predicted score set to the corresponding client node 110 .
  • the client node 110 may determine 362 that the prediction score of the data sample 102 (i.e., a positive sample) corresponding to the first type label exceeds 362 in the prediction score set. Prediction The third number of points scored. In some embodiments, at client node 110-k, the third number may be determined by:
  • localSum k represents the third number, Represents the value of the true value label 105 corresponding to the i-th data sample 102, Indicates the ranking value of the prediction score corresponding to the i-th data sample 102.
  • the sort value Can be set to indicate the predicted score The number of other predicted scores exceeded in the set of predicted scores.
  • the value of is 1, for negative samples, The value is 0. In this way, by The sum of can determine the number of samples in which the prediction score ranking of the positive sample exceeds the prediction score ranking of the remaining samples (also the number of such prediction scores).
  • localSum k may be determined as the value of another metric parameter (error value) in the metric information at client node 110-k.
  • localP k , localN k and localSum k are all measurement parameters that need to be determined in the example calculation method of AUC.
  • the total number of first-type labels at N client nodes 110 is The total number of first category labels is
  • the total number of prediction scores that the prediction score of the data sample 102 (that is, the positive sample) corresponding to the first type of label exceeds in the prediction score set is globalSum, then the value of the AUC of the model can be calculated in the following way:
  • the AUC can also be calculated in other ways, and the calculation of the AUC requires other metric parameters.
  • the client node 110 may determine the number of positive samples indicated by the ground truth label 105 and the number of negative samples indicated by the ground truth label 105 among the plurality of ground truth labels 105 locally.
  • the client node 110 may determine, based on the set of prediction scores, the number of prediction scores of positive samples that are greater than the prediction scores of negative samples among all data samples 102 . Based on these three numbers It can be the value of the metric parameter required to calculate the value of AUC.
  • Each client node 110 may determine the values of these metric parameters calculated on their respective data sets.
  • the prediction score corresponding to each data sample 102 is s i , i ⁇ [1,L].
  • AUC can also be determined from a probabilistic and statistical perspective based on other methods.
  • the client node 110 can determine the value of the measurement parameter related to the performance index from the local prediction score and the true value label according to the type and calculation method of the performance index. Embodiments of the present disclosure are not limited in this respect.
  • the values of the multiple metric parameters determined by the client node 110 are not directly sent to the service node 120.
  • the client node 110 applies 225 perturbations to the values of the multiple metric parameters to obtain the perturbation values of the multiple metric parameters.
  • the client node 110 sends 230 the perturbation values of the plurality of metric parameters to the service node 120 .
  • perturbations by applying perturbations, it is possible to avoid exposing the true values of metric parameters calculated based on true value labels. How client nodes apply perturbations is discussed in detail below. In this article, “disturbance” is sometimes also called noise, interference, etc.
  • the applied perturbation can satisfy the requirements of differential privacy of the data.
  • Protect In order to better understand the embodiments of the present disclosure, differential privacy and random response mechanisms will first be briefly introduced below.
  • ⁇ and ⁇ are real numbers greater than or equal to 0, that is, and It is a random mechanism (random algorithm).
  • the so-called random mechanism refers to that for a specific input, the output of the mechanism is not a fixed value, but obeys a certain distribution.
  • For the random mechanism It can be considered a random mechanism if the following conditions are met With ( ⁇ , ⁇ )-differential privacy: for any two adjacent training data sets D, D', and for An arbitrary subset ⁇ of possible outputs exists:
  • the random mechanism can also be considered With ⁇ -differential privacy ( ⁇ -DP).
  • ⁇ -DP ⁇ -differential privacy
  • the differential privacy mechanism for a random mechanism with ( ⁇ , ⁇ )-differential privacy or ⁇ -differential privacy It is expected that the distribution of the two outputs obtained after acting on two adjacent data sets respectively is indistinguishable. In this case, observers can hardly detect small changes in the input data set of the algorithm by observing the output results, thus achieving the purpose of protecting privacy. If the random mechanism If the probability of obtaining a specific output S is almost the same when applied to any adjacent data set, then it will be considered that the algorithm is difficult to achieve the effect of differential privacy.
  • the focus is on differential privacy of labels for data samples, and the labels indicate binary classification results. Therefore, following the setting of differential privacy, label differential privacy can be defined. Specifically, assume that ⁇ , ⁇ are real numbers greater than or equal to 0, that is, and It is a random mechanism (random algorithm). It can be considered a random mechanism if the following conditions are met With ( ⁇ , ⁇ )-label differential privacy (label differential privacy): For any two adjacent training data sets D, D′, their difference is only that the label of a single data sample is different, and for An arbitrary subset S of possible outputs exists:
  • the random mechanism can also be considered With ⁇ -label differential privacy ( ⁇ - Label DP).
  • ⁇ - Label DP ⁇ -label differential privacy
  • the random mechanism can obey a certain probability distribution.
  • the perturbation may be applied based on a Gaussian distribution or a Laplace distribution.
  • the client node 110 applies random perturbations to the values of respective metric parameters by determining sensitivity values for the metric parameter values to be perturbed and determining probability distributions based on the sensitivity values. Next, we will first introduce the sensitivity, and then introduce how to determine the probability distribution based on the sensitivity.
  • sensitivity refers to the maximum difference in the output of a function when at most one data element changes.
  • sensitivity values can be introduced to define specific probability distribution methods.
  • the standard deviation of the Gaussian distribution can be defined as where ⁇ represents the sensitivity value.
  • Such a Gaussian distribution has differential privacy of ( ⁇ , ⁇ ) (i.e., ( ⁇ , ⁇ )-DP).
  • the client node 110 when it applies a perturbation, it can determine the sensitivity values respectively related to the perturbations of different metric parameters, and determine the corresponding probability distribution based on the sensitivity values and the differential privacy mechanism.
  • the client node 110 may apply a perturbation value to the corresponding metric parameter according to a probability distribution.
  • the client node 110-k can apply perturbation to any one of the values. Because the other value can be determined by subtracting the previous number of perturbation values from the total number of truth labels 105 at that node.
  • a Gaussian distribution can be determined and a perturbation (also called noise or Gaussian noise) applied based on the Gaussian distribution.
  • a perturbation also called noise or Gaussian noise
  • Such a Gaussian distribution mechanism can satisfy the differential privacy of ( ⁇ , ⁇ ) (ie ( ⁇ , ⁇ )-DP).
  • the prediction score for the data sample 102 ie, the positive sample
  • the client node 110-k can determine the sensitivity value of the perturbation of the metric parameter.
  • the sensitivity value associated with localSum k is determined by the service node 120 from a global perspective.
  • the sensitivity value here indicates that if one data sample in the complete set of data samples is changed, localSum k will change to Q-1 at most.
  • it may Node 120 receives information related to sensitivity values. This information can be the total number Q of data samples of multiple client nodes, or it can directly receive the sensitivity value Q-1, or other information that can determine the sensitivity value.
  • the sensitivity value associated with localSum k is determined locally by each client node 110-k.
  • Such perturbation methods can achieve local privacy protection.
  • the client node 110-k determines the highest ranking result from the respective ranking results of multiple local prediction scores, and determines the sensitivity value based on the highest ranking result, which can be expressed as That is to say, for the local data set at each client node, if one of the data samples is changed, the maximum value that localSum k will change is related to the highest ranking of its corresponding prediction score in the overall prediction score set.
  • the client node 110-k After the client node 110-k determines the sensitivity value related to localSum k in any way, based on the sensitivity value, the client node 110-k can determine the probability distribution to be followed by perturbing localSum k according to the differential privacy mechanism.
  • a Gaussian distribution can be determined and a perturbation (also called noise or Gaussian noise) applied based on the Gaussian distribution.
  • a perturbation also called noise or Gaussian noise
  • the standard deviation of the Gaussian distribution can be determined as Such a Gaussian distribution mechanism can satisfy the differential privacy of ( ⁇ , ⁇ ) (ie ( ⁇ , ⁇ )-DP).
  • the client node 110 may send the perturbation value of the metric parameter to the service node 120 .
  • the service node 120 receives 235 from the plurality of client nodes 110 Each of them provides perturbation values for multiple metric parameters.
  • the service node 120 aggregates 240 the perturbation values of multiple metric parameters from multiple client nodes 110 according to metric parameters to obtain an aggregate value of multiple metric parameters.
  • the service node 120 determines 245 the value of the performance indicator of the prediction model 125 based on an aggregated value of a plurality of metric parameters.
  • the service node 120 aggregates the values of these metric parameters of the respective client nodes 110 ( For example, added together), one can get respectively:
  • these aggregated values may not be exactly equivalent to values calculated from ground truth labels and predicted scores at multiple client nodes 110 .
  • the mean value of the random perturbation eg, probability distribution
  • the random perturbations of the individual client nodes 110 can be canceled out each other, so that the aggregated values approximate the true values of these metric parameters.
  • the service node 120 may calculate the AUC value of the prediction model 125 according to the above equation (4). In some embodiments, the service node 120 may also similarly aggregate the perturbation values of other metric parameters obtained from the client node 110 for calculating performance indicators. For example, for the AUC calculation method given by the above equation (5), the service node 120 can receive the perturbation value of the corresponding metric parameter from the client node 110, and aggregate it for use in calculating the AUC.
  • the aggregation operation can offset the variance under a probability distribution with a mean of 0, there may still be a certain variance in the determined value compared with the true value of the performance indicator.
  • the inventor found that such variance is small and within the allowable range. Especially as the number of participating client nodes increases, the variance becomes smaller.
  • the service node uses values based on real label statistics, such as the real number of positive samples and the number of negative samples.
  • the standard deviation of the calculated AUC is where P is the number of positive samples, N is the number of negative samples, c is the number of client nodes, and ⁇ is the standard deviation of the added disturbance (noise).
  • P is the number of positive samples
  • N is the number of negative samples
  • c is the number of client nodes
  • is the standard deviation of the added disturbance (noise).
  • the service node 120 can additionally or alternatively calculate values of other performance indicators in a similar perturbation and interaction manner.
  • FIG. 4 illustrates a flow diagram of a process 400 for model performance evaluation at a client node, in accordance with some embodiments of the present disclosure.
  • Process 400 may be implemented at client node 110.
  • the client node 110 applies the plurality of data samples to the prediction model respectively at the client node to obtain a plurality of prediction scores output by the prediction model.
  • the plurality of prediction scores respectively indicate the prediction probabilities that the plurality of data samples belong to the first category or the second category.
  • the client node 110 determines values of a plurality of metric parameters related to predetermined performance indicators of the prediction model based on a plurality of ground truth labels and a plurality of prediction scores for the plurality of data samples.
  • the client node 110 applies perturbations to the values of the plurality of metric parameters to obtain perturbed values of the plurality of metric parameters.
  • the client node 110 sends the perturbation values of the plurality of metric parameters to the service node.
  • determining the values of the plurality of metric parameters includes: determining a first number of first-type labels among the plurality of true value labels as the values of the first metric parameters, the first-type labels indicating corresponding data The sample belongs to the first category; and a second number of second category labels among the plurality of true value labels is determined as the value of the second measurement parameter, and the second category label indicates that the corresponding data sample belongs to the second category.
  • applying a perturbation to the values of the plurality of metric parameters includes: determining a first sensitivity value associated with a perturbation of one of the first metric parameter and the second metric parameter; based on the first sensitivity value and the difference a privacy mechanism to determine the first probability distribution; based on the first probability distribution, apply perturbation to the value of one of the first metric parameter and the second metric parameter to obtain the perturbation value of one metric parameter; and based on multiple true The total number of value labels and the perturbation value of one of the first metric parameter and the second metric parameter determines the perturbation value of another of the first metric parameter and the second metric parameter.
  • determining the values of the plurality of metric parameters includes: sending a plurality of prediction scores to the service node; receiving from the service node a ranking result of each of the plurality of prediction scores in a prediction score set, where the prediction score set includes a plurality of The prediction score sent by the client node, multiple client nodes including the client node; and based on the respective sorting results of the multiple prediction scores, determine the prediction score that the prediction score of the data sample corresponding to the first type of label exceeds the prediction score in the prediction score set The third number is used as the value of the third metric parameter.
  • applying a perturbation to the values of the plurality of metric parameters includes: determining a second sensitivity value related to the perturbation of the third metric parameter; determining a second probability distribution based on the second sensitivity value and the differential privacy mechanism; and Based on the second probability distribution, a perturbation is applied to the value of the third metric parameter.
  • determining the second sensitivity value includes: receiving information related to the second sensitivity value from the service node; and determining the second sensitivity value based on the received information.
  • the information related to the second sensitivity value includes a total number of data samples for the plurality of client nodes.
  • determining the second sensitivity value includes: determining a highest ranking result from respective ranking results of the plurality of prediction scores; and determining the second sensitivity value based on the highest ranking result.
  • the predetermined performance metric includes at least the area under the receiver operating characteristic curve (ROC) curve (AUC).
  • ROC receiver operating characteristic curve
  • FIG. 5 illustrates a flow diagram of a process 500 for model performance evaluation at a service node, in accordance with some embodiments of the present disclosure.
  • Process 500 may be implemented at service node 120.
  • the service node 120 receives perturbation values of a plurality of metric parameters related to predetermined performance indicators of the prediction model from a plurality of client nodes, respectively.
  • the service node 120 aggregates perturbation values of multiple metric parameters from multiple client nodes by metric parameters to obtain an aggregate value of multiple metric parameters.
  • the service node 120 determines a value for a predetermined performance indicator based on an aggregated value of a plurality of metric parameters.
  • the perturbation value of the plurality of metric parameters indicates at least one of the following: a first category among the plurality of truth labels at the given client node The first number of labels, the first type label indicates that the corresponding data sample belongs to the first category; the second number of the second type labels among the plurality of true value labels is used as the value of the second measurement parameter, the second type label indicates The corresponding data sample belongs to the second category; and the prediction score of the data sample corresponding to the first category label at the given client node exceeds a third number of prediction scores in the prediction score set, the prediction score is determined by the prediction model based on the data sample It is determined, and the set of predicted scores includes predicted scores sent by multiple client nodes.
  • the process 500 further includes: sending information related to the second sensitivity value to multiple client nodes respectively.
  • the information related to the second sensitivity value includes a total number of data samples for the plurality of client nodes.
  • the predetermined performance metric includes at least the area under the receiver operating characteristic curve (ROC) curve (AUC).
  • ROC receiver operating characteristic curve
  • Figure 6 shows a block diagram of an apparatus 600 for model performance evaluation at a client node, in accordance with some embodiments of the present disclosure.
  • Apparatus 600 may be implemented as or included in client node 110 .
  • Each module/component in the device 600 may be implemented by hardware, software, firmware, or any combination thereof.
  • the apparatus 600 includes a prediction module 610 configured to respectively apply a plurality of data samples to a prediction model to obtain a plurality of prediction scores output by the prediction model, and the plurality of prediction scores respectively indicate that the plurality of data samples belong to the first The predicted probability of a class or second class.
  • the apparatus 600 further includes a metric determination module 620 configured to determine values of a plurality of metric parameters related to predetermined performance indicators of the prediction model based on a plurality of ground truth labels and a plurality of prediction scores for the plurality of data samples.
  • the device 600 also includes a perturbation module 630, configured to apply perturbations to the values of multiple metric parameters to obtain perturbation values of the multiple metric parameters; and a sending module 640, configured to send the perturbation values of the multiple metric parameters to the service node. .
  • the metric determination module 620 includes: a first determination module configured to determine a first number of first category labels among the plurality of ground truth labels as the value of the first metric parameter, the first category The label indicates that the corresponding data sample belongs to the first category; and a second determination module configured to determine a second number of second category labels among the plurality of true value labels as the value of the second measurement parameter, the second category label Indicates that the corresponding data sample belongs to the second category.
  • the perturbation module includes: a first sensitivity determination module configured to determine a first sensitivity value associated with a perturbation of one of the first metric parameter and the second metric parameter; a first distribution determination module , is configured to determine the first probability distribution based on the first sensitivity value and the differential privacy mechanism; the first perturbation application module is configured to determine one of the first metric parameter and the second metric parameter based on the first probability distribution. measure applying perturbation to the value of the parameter to obtain a perturbation value of a metric parameter; and a perturbation value determination module configured to be based on the total number of multiple true value labels and the perturbation value of the metric parameter in the first metric parameter and the second metric parameter. , determine the disturbance value of another metric parameter among the first metric parameter and the second metric parameter.
  • the metric determination module includes: a score sending module configured to send a plurality of predicted scores to the service node; a result receiving module configured to receive a plurality of predicted scores each in the predicted score set from the service node. Sorting results, the prediction score set includes prediction scores sent by a plurality of client nodes, the plurality of client nodes include client nodes; and a third determination module is configured to determine the first ranking result based on the respective sorting results of the plurality of prediction scores. The third number of prediction scores that the prediction score of the data sample corresponding to the class label exceeds in the prediction score set is used as the value of the third metric parameter.
  • the perturbation module includes: a second sensitivity determination module configured to determine a second sensitivity value related to the perturbation of the third metric parameter; a second distribution determination module configured to determine based on the second sensitivity value and the difference a privacy mechanism to determine the second probability distribution; and a second perturbation application module configured to apply perturbation to the value of the third metric parameter based on the second probability distribution.
  • the second sensitivity determination module includes: a sensitivity receiving module configured to receive information related to the second sensitivity value from the service node; and an information-based determination module configured to determine based on the received information. Second sensitivity value.
  • the information related to the second sensitivity value includes a total number of data samples for the plurality of client nodes.
  • the second sensitivity determination module includes: a ranking determination module configured to determine a highest ranking result from respective ranking results of the plurality of prediction scores; and a ranking-based determination module configured to determine the highest ranking result based on the highest ranking result. Determine the second sensitivity value.
  • the predetermined performance metric includes at least the area under the receiver operating characteristic curve (ROC) curve (AUC).
  • ROC receiver operating characteristic curve
  • Figure 7 illustrates a block diagram of an apparatus 700 for model performance evaluation at a client node, in accordance with some embodiments of the present disclosure.
  • Apparatus 700 may be implemented as or included in service node 120 .
  • Each module/component in the device 700 can be composed of hardware, software, firmware Or any combination of them.
  • the apparatus 700 includes a receiving module 710 configured to receive perturbation values of a plurality of metric parameters related to predetermined performance indicators of the prediction model from a plurality of client nodes respectively.
  • the apparatus 700 further includes an aggregation module 720 configured to aggregate perturbation values of multiple metric parameters from multiple client nodes according to metric parameters to obtain aggregate values of the multiple metric parameters.
  • the apparatus 700 also includes a performance determination module 730 configured to determine a value of a predetermined performance indicator based on an aggregate value of a plurality of metric parameters.
  • the perturbation value of the plurality of metric parameters indicates at least one of the following: a first category among the plurality of truth labels at the given client node The first number of labels, the first type label indicates that the corresponding data sample belongs to the first category; the second number of the second type labels among the plurality of true value labels is used as the value of the second measurement parameter, the second type label indicates The corresponding data sample belongs to the second category; and the prediction score of the data sample corresponding to the first category label at the given client node exceeds a third number of prediction scores in the prediction score set, the prediction score is determined by the prediction model based on the data sample It is determined, and the set of predicted scores includes predicted scores sent by multiple client nodes.
  • the apparatus 700 further includes: a sensitivity sending module configured to send information related to the second sensitivity value to multiple client nodes respectively.
  • the information related to the second sensitivity value includes a total number of data samples for the plurality of client nodes.
  • the predetermined performance metric includes at least the area under the receiver operating characteristic curve (ROC) curve (AUC).
  • ROC receiver operating characteristic curve
  • FIG. 8 illustrates a block diagram of a computing device/system 800 capable of implementing one or more embodiments of the present disclosure. It should be understood that the computing device/system 800 shown in Figure 8 is exemplary only and should not constitute any limitation on the functionality and scope of the embodiments described herein. The computing device/system 800 shown in FIG. 8 may be used to implement the client node 110 or the service node 120 of FIG. 1 .
  • computing device/system 800 is in the form of a general purpose computing device.
  • Components of computing device/system 800 may include, but are not limited to, one or more processors or processing units 810, memory 820, storage device 830, one or more communication units 840, one or more input devices 850, and one or more output devices 860.
  • the processing unit 810 may be a real or virtual processor and can perform various processes according to a program stored in the memory 820 . In a multi-processor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capabilities of the computing device/system 800.
  • Computing device/system 800 typically includes a plurality of computer storage media. Such media may be any available media that is accessible to computing device/system 800, including, but not limited to, volatile and nonvolatile media, removable and non-removable media.
  • Memory 820 may be volatile memory (e.g., registers, cache, random access memory (RAM)), nonvolatile memory (e.g., read only memory (ROM), electrically erasable programmable read only memory (EEPROM) , flash memory) or some combination thereof.
  • Storage device 830 may be a removable or non-removable medium and may include machine-readable media such as a flash drive, a magnetic disk, or any other medium that may be capable of storing information and/or data (e.g., training data for training ) and can be accessed within computing device/system 800.
  • machine-readable media such as a flash drive, a magnetic disk, or any other medium that may be capable of storing information and/or data (e.g., training data for training ) and can be accessed within computing device/system 800.
  • Computing device/system 800 may further include additional removable/non-removable, volatile/non-volatile storage media.
  • a disk drive may be provided for reading from or writing to a removable, non-volatile disk (eg, a "floppy disk") and for reading from or writing to a removable, non-volatile optical disk. Read or write to optical disc drives.
  • each drive may be connected to the bus (not shown) by one or more data media interfaces.
  • Memory 820 may include a computer program product 825 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
  • the communication unit 840 implements communication with other computing devices through communication media. Additionally, the functionality of the components of computing device/system 800 may be implemented as a single computing cluster or as multiple computing machines capable of communicating over a communications connection. Accordingly, computing device/system 800 may operate in a networked environment using logical connections to one or more other servers, networked personal computers (PCs), or another network node.
  • PCs networked personal computers
  • Input device 850 may be one or more input devices, such as a mouse, keyboard, trackball, etc.
  • Output device 860 may be one or more output devices, such as a monitor, speaker, speakers, printers, etc.
  • the computing device/system 800 may also communicate via the communication unit 840 with one or more external devices (not shown), such as storage devices, display devices, etc., as needed, and with one or more external devices that enable the user to interact with the computing device/system. 800 interacts with devices, or communicates with any device (e.g., network card, modem, etc.) that enables computing device/system 800 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • a computer-readable storage medium is provided with computer-executable instructions or computer programs stored thereon, wherein the computer-executable instructions or computer programs are executed by a processor to implement the method described above. .
  • a computer program product is also provided, the computer program product is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
  • These computer-readable program instructions may be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, the computer-readable program instructions , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process, thus making The instructions executing on a computer, other programmable data processing apparatus, or other device implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more executable functions for implementing the specified logical functions instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

根据本公开的实施例,提供了用于模型性能评估的方法、装置、设备和介质。该方法包括:在客户端节点处,将多个数据样本分别应用到预测模型,以得到预测模型输出的多个预测得分,多个预测得分分别指示多个数据样本属于第一类别或第二类别的预测概率;基于多个数据样本的多个真值标签和多个预测得分,确定与预测模型的预定性能指标相关的多个度量参数的值;对多个度量参数的值施加扰动,得到多个度量参数的扰动值;以及将多个度量参数的扰动值发送给服务节点。由此,在实现模型性能评估的同时,达到了对客户端节点本地标签数据的隐私保护目的。

Description

用于模型性能评估的方法、装置、设备和介质
本申请要求于2022年5月13日递交的,标题为“用于模型性能评估的方法、装置、设备和介质”、申请号为202210524865.2的中国发明专利申请的优先权。
技术领域
本公开的示例实施例总体涉及计算机领域,特别地涉及用于模型性能评估的方法、装置、设备和计算机可读存储介质。
背景技术
当前机器学习已经得到了广泛的应用,其性能通常是随着数据量的增加而提高。在一些方案中,需要集中充足的收集数据样本和标签数据用于机器学习模型的训练。然而,在很多现实场景中,存在着所谓的数据孤岛问题,即数据通常是分散隔离的,存储在不同的实体(例如,企业、用户端)上。随着数据隐私保护问题越来越受到重视,这样的集中式机器学习难以到达数据保护的目的。
当前,提出了联邦学习方案。联邦学习,指的是在保证数据隐私安全的基础上,利用各个节点的数据实现共同建模,提升机器学习模型的效果。联邦学习可以允许各个节点不离开端,以达到数据保护目的。
发明内容
根据本公开的示例实施例,提供了一种用于模型性能评估的方案。
在本公开的第一方面,提供了一种用于模型性能评估的方法。该方法包括:在客户端节点处,将多个数据样本分别应用到预测模型,以得到预测模型输出的多个预测得分,多个预测得分分别指示多个数 据样本属于第一类别或第二类别的预测概率;基于多个数据样本的多个真值标签和多个预测得分,确定与预测模型的预定性能指标相关的多个度量参数的值;对多个度量参数的值施加扰动,得到多个度量参数的扰动值;以及将多个度量参数的扰动值发送给服务节点。
在本公开的第二方面,提供了一种用于模型性能评估的方法。该方法包括:在服务节点处,从多个客户端节点分别接收与预测模型的预定性能指标相关的多个度量参数的扰动值;按度量参数聚合来自多个客户端节点的多个度量参数的扰动值,得到多个度量参数的聚合值;以及基于多个度量参数的聚合值来确定预定性能指标的值。
在本公开的第三方面,提供了一种用于模型性能评估的装置。该装置包括:预测模块,被配置为将多个数据样本分别应用到预测模型,以得到预测模型输出的多个预测得分,多个预测得分分别指示多个数据样本属于第一类别或第二类别的预测概率;度量确定模块,被配置为基于多个数据样本的多个真值标签和多个预测得分,确定与预测模型的预定性能指标相关的多个度量参数的值;扰动模块,被配置为对多个度量参数的值施加扰动,得到多个度量参数的扰动值;以及发送模块,被配置为将多个度量参数的扰动值发送给服务节点。
在本公开的第四方面,提供了一种用于模型性能评估的装置。该装置包括:接收模块,被配置为从多个客户端节点分别接收与预测模型的预定性能指标相关的多个度量参数的扰动值;聚合模块,被配置为按度量参数聚合来自多个客户端节点的多个度量参数的扰动值,得到多个度量参数的聚合值;以及性能确定模块,被配置为基于多个度量参数的聚合值来确定预定性能指标的值。
在本公开的第五方面,提供了一种电子设备。该设备包括至少一个处理单元;以及至少一个存储器,至少一个存储器被耦合到至少一个处理单元并且存储用于由至少一个处理单元执行的指令。指令在由至少一个处理单元执行时使设备执行第一方面的方法。
在本公开的第六方面,提供了一种电子设备。该设备包括至少一个处理单元;以及至少一个存储器,至少一个存储器被耦合到至少一 个处理单元并且存储用于由至少一个处理单元执行的指令。指令在由至少一个处理单元执行时使设备执行第二方面的方法。
在本公开的第七方面,提供了一种计算机可读存储介质。介质上存储有计算机程序,计算机程序被处理器执行以实现第一方面的方法。
在本公开的第八方面,提供了一种计算机可读存储介质。介质上存储有计算机程序,计算机程序被处理器执行以实现第二方面的方法。
应当理解,本发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键特征或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的描述而变得容易理解。
附图说明
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:
图1示出了本公开的实施例能够在其中应用的示例环境的示意图;
图2示出了根据本公开的一些实施例的用于模型性能评估的信令流的流程图;
图3A示出了根据本公开的一些实施例的确定度量参数的值的过程的流程图;
图3B示出根据本公开的一些实施例的确定度量参数的值的信令流的流程;
图4示出根据本公开的一些实施例的在服务节点处用于模型性能评估的过程的流程图;
图5示出根据本公开的一些实施例的在服务节点处用于模型性能评估的过程的流程图;
图6示出了根据本公开的一些实施例的在客户端节点处用于模型性能评估的装置的框图;
图7示出了根据本公开的一些实施例的在服务节点处用于模型性能评估的装置的框图;以及
图8示出了能够实施本公开的一个或多个实施例的计算设备/系统的框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“一些实施例”应当理解为“至少一些实施例”。下文还可能包括其他明确的和隐含的定义。
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当根据相关法律法规通过适当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种可选的但非限制性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式,例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其他满足相关法律法规的方式也可应用于本公开的实现方式中。
如本文中所使用的,术语“模型”可以从训练数据中学习到相应的输入与输出之间的关联,从而在训练完成后可以针对给定的输入,生成对应的输出。模型的生成可以基于机器学习技术。深度学习是一种机器学习算法,通过使用多层处理单元来处理输入和提供相应输出。神经网络模型是基于深度学习的模型的一个示例。在本文中,“模型”也可以被称为“机器学习模型”、“学习模型”、“机器学习网络”或“学习网络”,这些术语在本文中可互换地使用。
“神经网络”是一种基于深度学习的机器学习网络。神经网络能够处理输入并且提供相应输出,其通常包括输入层和输出层以及在输入层与输出层之间的一个或多个隐藏层。在深度学习应用中使用的神经网络通常包括许多隐藏层,从而增加网络的深度。神经网络的各个层按顺序相连,从而前一层的输出被提供作为后一层的输入,其中输入层接收神经网络的输入,而输出层的输出作为神经网络的最终输出。神经网络的每个层包括一个或多个节点(也称为处理节点或神经元),每个节点处理来自上一层的输入。
通常,机器学习大致可以包括三个阶段,即训练阶段、测试阶段和应用阶段(也称为推理阶段)。在训练阶段,给定的模型可以使用大量的训练数据进行训练,不断迭代更新参数值,直到模型能够从训练数据中获取一致的满足预期目标的推理。通过训练,模型可以被认为能够从训练数据中学习从输入到输出之间的关联(也称为输入到输出的映射)。训练后的模型的参数值被确定。在测试阶段,将测试输入应用到训练后的模型,测试模型是否能够提供正确的输出,从而确定模型的性能。在应用阶段,模型可以被用于基于训练得到的参数值,对实际的输入进行处理,确定对应的输出。
图1示出了本公开的实施例能够在其中实现的示例环境100的示意图。环境100涉及联邦学习环境,其中包括N个客户端节点110- 1……110-k、……110-N(其中N为大于1的整数,k=1、2、……N)以及服务节点120。客户端节点110-1……110-k、……110-N可以分别维护各自的本地数据集112-1……112-k、……112-N。为便于讨论,客户端节点110-1……110-k、……110-N可以被统称为或单独称为客户端节点110,本地数据集112-1……112-k、……112-N可以被统称为或单独称为本地数据集112。
在一些实施例中,客户端节点110和/或服务节点120可以被实现在终端设备或服务器处。终端设备可以是任意类型的移动终端、固定终端或便携式终端,包括移动手机、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、媒体计算机、多媒体平板、个人通信系统(PCS)设备、个人导航设备、个人数字助理(PDA)、音频/视频播放器、数码相机/摄像机、定位设备、电视接收器、无线电广播接收器、电子书设备、游戏设备或者前述各项的任意组合,包括这些设备的配件和外设或者其任意组合。在一些实施例中,终端设备也能够支持任意类型的针对用户的接口(诸如“可佩戴”电路等)。服务器是能够提供计算能力的各种类型的计算系统/服务器,包括但不限于大型机、边缘计算节点、云环境中的计算设备,等等。
在联邦学习中,客户端节点指的是提供应用训练、验证或评估预测模型的部分数据的节点。客户端节点也可称为客户端、终端节点、终端设备、用户设备等。在联邦学习中,服务节点指的是聚合客户端节点处的结果的节点。
在图1的示例中,假设N个客户端节点110节点共同参与对预测模型125的训练,并将训练中的中间结果汇集到服务节点120,以由服务节点120更新预测模型125的参数集。这些客户端节点110的本地数据的全集构成预测模型125的完整训练数据集。因此,根据联邦学习的机制,服务节点120可以确定全局的预测模型125。
针对预测模型125,客户端节点110处的本地数据集112可以包括数据样本和真值标签。图1具体示出了客户端节点110-k处的本地数据集112-k,其包括数据样本集和真值标签集。数据样本集包括多 个(M个)数据样本102-1、102-i、……102-M(统称为或单独称为数据样本102),并且真值标签集包括对应的多个(M个)真值标签(ground-truth label)105-1、105-i、……105-M(统称为或单独称为真值标签105)。其中M为大于1的整数,i=1、2、……M。每个数据样本102可以被标注有对应的真值标签105。数据样本102可以对应于预测模型125的输入,真值标签105指示数据样本102的真实输出。真值标签是有监督机器学习中的重要部分。
在本公开的实施例中,预测模型125可以基于各种机器学习或深度学习的模型架构来构建,并且可以被配置为实现各种预测任务,诸如各种分类任务、推荐任务等等。相应地,预测模型125也可以被称为推荐模型、分类模型,等等。
数据样本102可以包括与预测模型125的具体任务相关的输入信息,真值标签105与任务的期望输出有关。作为一个示例,在二分类任务中,预测模型125可以被配置为预测输入的数据样本属于第一类别或是第二类别,真值标签用于标注该数据样本实际属于第一类别或是第二类别。很多实际应用均可以被归类为这样的二分类任务,例如在推荐任务中对推荐项目的转化(例如,点击、购买、注册或其他需求行为)与否,等等。
应当理解,图1仅示出了示例的联邦学习环境。根据联邦学习算法和实际应用需要,环境还可以不同。例如,虽然被示出为单独的节点,在某些应用中,服务节点120除了作为中央节点外,还可以作为客户端节点,以提供部分数据用于模型训练、模型性能评估等。本公开的实施例在此方面不受限制。
在预测模型125的训练阶段,已有一些机制保护各个客户端节点110的本地数据不泄露。例如,在模型训练过程中,客户端节点110不必透漏本地的数据样本或标签数据,而是向服务节点120发送根据本地训练数据计算的梯度数据,以供服务节点120更新预测模型125的参数集。
在一些情况下,还希望评估训练出的预测模型的性能。模型性能 的评估也需要数据,包括模型输入需要的数据样本以及数据样本对应的标签数据。预测模型的性能可以通过一个或多个性能指标来衡量。不同性能指标可以从不同角度,衡量预测模型针对数据样本集给出的预测输出与真值标签集所指示的真实输出之间的差异。通常,如果预测模型给出的预测输出与真实输出之间的差异较小,那意味着预测模型的性能较好。可以看出,通常需要基于数据样本的真值标签集来确定预测模型的性能指标。
随着数据监管体系不断加强,对数据隐私保护的要求也越来越高。对数据样本的真值标签也需要保护,避免被泄露。因此,如何既能够确定预测模型的性能指标,又保护客户端节点本地的标签数据不被泄露,是一项具有挑战性的任务。当前还没有非常有效的方案能够解决该问题。
根据本公开的实施例,提供了一种模型性能评估方案,其能够保护客户端节点本地的标签数据。具体地,在客户端节点处,在计算出与预测模型的性能指标相关的多个度量参数的值后,对所确定的度量参数的值施加扰动,得到多个度量参数的扰动值。客户端节点将度量参数的扰动值发送给服务节点。由于不需要直接发送度量参数的真实值,通过扰动值,观察者难以从扰动值推导出数据样本的真值标签。这样,可以有效避免数据泄露。
在服务节点处,服务节点从多个客户端节点接收到它们各自确定出的多个度量参数的扰动值。服务节点按度量参数聚合来自多个客户端节点的多个度量参数的扰动值。在对多个不同来源的扰动值的聚合进行聚合后,扰动被抵消。因此,基于多个度量参数的聚合值,服务节点能够准确确定模型的性能指标的值。
根据本公开的实施例,各个客户端节点无需暴露本地的真值标签集或基于真值标签确定的参数值,并且还能允许服务节点计算出模型的性能指标值。以此方式,在实现模型性能评估的同时,达到了对客户端节点本地标签数据的隐私保护目的。
以下将继续参考附图描述本公开的一些示例实施例。
图2示出了根据本公开的一些实施例的用于模型性能评估的信令流200的示意框图。为便于讨论,参考图1的环境100进行讨论。信令流200涉及客户端节点110和服务节点120。
在本公开的实施例中,假设要评估预测模型125的性能。在一些实施例中,待评估的预测模型125可以是基于联邦学习的训练过程确定的全局预测模型,例如客户端节点110和服务节点120参与了预测模型125的训练过程。在一些实施例中,预测模型125也可以是以任何其他方式获得的模型,并且客户端节点110和服务节点120可以未参与了预测模型125的训练过程。本公开的范围在此方面不受限制。
在一些实施例中,如信令流200所示,服务节点120将预测模型125发送205给N个客户端节点110。在接收210到预测模型125后,各个客户端节点110可以基于预测模型125来执行后续评估过程。在一些实施例中,也可以以任何其他适当的方式将要评估的预测模型125提供给客户端节点110。
在本公开的实施例中,将单个客户端节点110的角度来描述客户端测的操作。多个客户端节点110可以类似操作。
在信令流200中,客户端节点110将多个数据样本分别应用215模型125,以得到预测模型125输出的多个预测得分。假设客户端节点110-k的数据样本集是Xk,预测模型125被表示为f(),那么针对数据样本集的预测得分集合可以被表示为sk=f(Xk)。
在本公开的实施例中,特别关注于在实现二分类任务的预测模型的性能指标。每个预测得分可以指示对应的数据样本102属于第一类别或第二类别的预测概率。这两个类别可以根据实际任务需要配置。
预测模型125输出的预测得分的取值范围可以任意设置。例如,预测得分可以是在某个连续取值区间中的取值(例如,0到1之间的取值),或者可以是多个离散取值中的一个取值(例如,可以是0、1、2、3、4、5等离散取值之一)。在一些示例中,越高的预测得分可以指示数据样本102属于第一类别的预测概率越大,属于第二类别的预测概率越小。当然,相反设置也是可以的,例如越高的预测得分 可以指示数据样本102属于第二类别的预测概率越大,属于第一类别的预测概率越小。
客户端节点110基于所述多个数据样本102的多个真值标签(也可称为真实值标签)和模型输出的多个预测得分,确定220与预测模型125的预定性能指标相关的多个度量参数的值。
真值标签105用于标注对应的数据样本102属于第一类别或是第二类别。在下文中,为了方便讨论,将由属于第一类别的数据样本有时称为正样本、正例或正类样本,将属于第二类别的数据样本有时称为负样本、负例或负类样本。在一些实施例中,每个真值标签105可以具有两个取值之一,分别用于指示第一类别或第二类别。在下文的一些实施例中,为了方便讨论,可以将第一类别对应的真值标签105的取值设置为“1”,其指示数据样本属于第一类别,是正样本。此外,可以将第二类别对应的真值标签105的取值设置为“0”,其指示数据样本属于第二类别,是负样本。
在本公开的实施例中,个体客户端节点110根据本地数据集(数据样本和真值标签)确定与模型的性能指标相关的度量信息。通过将多个客户端节点110的度量信息汇总到服务节点120,可以相当于在多个客户端节点的完整数据集基础上评估出预测模型125的性能。
度量信息指的是在计算模型的性能指标时需要关心的信息,通常可以由多个度量参数指示。这些度量参数的值是需要在数据样本经过模型后输出的结果(即预测得分),以及数据样本对应的真值标签的基础上进行统计来得出。由客户端节点提供的度量信息的类型可以取决于要计算的具体性能指标。
在下文中,为便于理解,首先介绍用于实现二分类任务的预测模型125的一些示例性能指标。
预测模型125针对某个数据样本输出的预测得分,通常会用于与某个得分阈值相比较,并根据比较结果确定该数据样本被预测为属于第一类别或是第二类别。用于实现二分类任务的预测模型125的预测可能会出现四种结果。
具体地,对于某个数据样本102,假设真值标签105指示其属于第一类别(正样本),预测模型125也预测出其为正样本,那认为该数据样本是真正样本(True Positive,TP)。如果真值标签105指示其属于第一类别(正样本),预测模型125预测出其为负样本,那认为该数据样本是假负样本(False Negative,FN)。如果真值标签105指示其属于第二类别(负样本),但预测模型125也预测出其为负样本,那认为该数据样本是真负本(True Negative,TN)。如果真值标签105指示其属于第二类别(负样本),但预测模型125预测出其为正样本,那认为该数据样本是假正本(False Positive,FP)。这四种结果可以由以下表1的混淆矩阵指示。
表1
在衡量预测模型125的性能时,期望能够在多个客户端节点110的数据样本全集的预测结果以及真值标签全集基础上计算性能指标。
在一些实施例中,预测模型125的性能指标还可以包括假正样本比率(FPR)和/或假正样本比率(FPR)。FPR可以被定义为:在实际为负例的数据样本中,被模型错误地判断为阳性的比率,表示为FPR=FP/(FP+TN),其中FP、TN表示在数据样本全集中统计出的FP、TN的数目。TPR:实际为阳性的样本中,被正确地判断为阳性的比率,表示为TPR=TP/(TP+FN)。
在一些实施例中,预测模型125的性能指标可以包括受试者工作特征曲线(ROC)的曲线下面积(AUC)。
ROC曲线是根据不同的分类方式(设置不同得分阈值),以假正样本比率(FPR)为X轴,真正样本比率(TPR)为Y轴,在坐标轴上绘制出的曲线。根据每个可能的得分阈值,可以计算出多个(FPR, TPR)对的坐标点,将这些点连成线,就成为特定模型的ROC曲线。
从定义上理解,AUC指的是ROC曲线下方面积。在计算AUC时,一种可能的方式是,可以根据AUC的定义,可以通过用近似算法计算ROC曲线下的面积来计算AUC。
在一些实施例中,还可以从概率视角来确定AUC。AUC可以被认为是:随机选择一个正样本和一个负样本,预测模型给正样本的预测得分高于负样本的预测得分的概率。也就是说,在数据样本集中,将正、负样本两两相遇形成正负样本对,其中正样本的预测得分大于负样本的预测得分的占比。如果模型能够给更多正样本输出高于负样本的预测得分,可以认为AUC更高,模型的性能更好。AUC的取值范围在0.5和1之间。AUC越接近1,说明模型的性能越好。
在上述AUC计算中,均需要基于数据样本的标签数据和预测结果来确定一些度量参数的值。
除AUC之外,预测模型125的性能指标还可以包括精确率(Precision),其被表示为Precision=TP/TP+FP。精确度表示,被预测为正样本的数据样本102子集中,由标签标注为正样本的概率。预测模型125的性能指标还可以包括召回率(Recall),其被表示为Recall=TP/TP+FN,即正样本被预测的概率。预测模型125的性能指标还可以包括P-R曲线,其以召回率为横轴,精确度为纵轴。P-R曲线越靠近右上角,说明模型的性能越好。曲线下面积称作AP分数(Average Precision Score,平均精确率分数)。
在下文中,将主要以AUC的确定作为示例来进行讨论。AUC可以具有不同计算方式,在不同计算方式下需要的度量参数也不同。
图3A示出了根据本公开的一些实施例的确定度量参数的值的过程300的流程图。过程300用于确定在AUC的一种计算方式中需要的度量参数的值。过程300可以在客户端节点110处实现。
在框310,客户端节点110确定多个真值标签105中第一类标签的数目(称为“第一数目”),这里的第一类标签指示对应的数据样本102属于第一类别,例如指示数据样本102是正样本。在框320, 客户端节点110,客户端节点110还可以确定多个真值标签105中的第二类标签的数目(称为“第二数目”),这里的第二类标签指示对应的数据样本102属于第二类别,例如指示数据样本102是负样本。
在客户端节点110-k处,对第一数目和第二数目的确定可以被表示为如下:

localNk=|Xk|-localPk     (2)
其中|Xk|表示客户端节点110-k的数据样本102的总数目;表示第i个数据样本102对应的真值标签105的值;localPk表示在客户端节点110-k处第一类标签(指示正样本的标签)的数目,localNk表示在客户端节点110-k处真值标签105中第二类标签(指示正样本的标签)的数目。
在上式(1)和(2)中,假设对于正样本,的取值为1,对于负样本,的取值为0,这样通过对的加和,可以统计出由真值标签105指示的正样本的数目。除正样本之外的其他样本是由真值标签105指示的负样本的数目。在其他示例中,如果真值标签105用其他取值来指示正样本和负样本,还可以通过其他方式来统计localPk和localNk,本文对此不做限制。localPk和localNk可以被确定为客户端节点110-k处的度量信息中的两个度量参数的值。
在一些实施例中,在框330,客户端节点110客户端节点110还可以基于多个预测得分在所有客户端节点的预测得分中的排序结果,确定第一类标签对应的数据样本102(即,正样本)的预测得分在预测得分集合中超过的预测得分的数目(称为第三数目)。三数目可以作为性能指标的另一个度量参数的值。这个数目可以指示在数据样本102的总集合中,正样本的排序高于其余样本的排序的样本对个数(在升序排序的情况下)。
对于个体客户端节点110,为了获得它的预测得分在所有客户端节点的预测得分中的排序结果,客户端节点110要与服务节点120进行信令交互。图3B示出根据本公开的一些实施例的确定度量参数的值的信令流350的流程图。
在信令流350中,客户端节点110将预测模型125输出的预测得分发送352给服务节点120进行排序。
在一些实施例中,在向服务节点120发送预测得分之前,客户端节点110可以随机调整多个预测得分的顺序,并按调整后的顺序将多个预测得分发送给服务节点。通过随机调整顺序,可以避免在一些特殊情况下,在客户端节点处将多个数据样本102顺序输入预测模型125后,所输出的预测得分具有一定的顺序,例如从大到小或从小到大的顺序,这可能会导致一定的信息泄露。随机顺序调整可以进一步加强数据隐私保护。
在接收354预测得分后,服务节点120对来自多个客户端节点110的预测得分集合进行排序356,从而得到来自每个客户端节点110的预测得分在预测得分集合中的排序结果。
在一些实施例中,服务节点120可以对预测得分集合升序排序,并对每个预测得分(客户端节点110-k的第i个数据样本102的预测得分)分配排序值在一些实施例中,排序值可以指示预测得分在预测得分集合中超过的其他预测得分的数目。例如,按升序排序,最低的预测得分被分配排序值0,指示其未超过(大于)任何其他预测得分;下一个预测得分被分配排序值1,指示其大于集合中的1个预测得分,以此类推。这样的排序值的分配有利于后续的计算。
对于接收到预测得分的客户端节点110,服务节点120将它的多个预测得分在整体的预测得分集合中的排序结果发送358给对应的客户端节点110。在单个客户端节点110处,基于从服务节点120接收360的排序结果,客户端节点110可以确定362第一类标签对应的数据样本102(即,正样本)的预测得分在预测得分集合中超过的预测 得分的第三数目。在一些实施例中,在客户端节点110-k处,第三数目可以通过以下来确定:
其中localSumk表示第三数目,表示第i个数据样本102对应的真值标签105的值,表示第i个数据样本102对应的预测得分的排序值。如前所述,排序值可以被设置为指示预测得分在预测得分集合中超过的其他预测得分的数目。在上式(3)中,也假设对于正样本,的取值为1,对于负样本,的取值为0。这样,通过对的加和,可以确定出正样本的预测得分排序超过其余样本的预测得分排序的样本数目(也是这样的预测得分的数目)。localSumk可以被确定为客户端节点110-k处的度量信息中的另一个度量参数的值(差错值)。
localPk、localNk和localSumk均是在AUC的示例计算方式中需要确定的度量参数。假设N个客户端节点110处第一类标签的总数目是第一类标签的总数目是并且第一类标签对应的数据样本102(即,正样本)的预测得分在预测得分集合中超过的预测得分的总数目是globalSum,那么可以通过以下方式计算模型的AUC的值:
在一些实施例中,还可以通过其他方式来计算AUC,并且AUC的计算需要其他的度量参数。在一些实施例中,客户端节点110可以确定在本地的多个真值标签105中,由真值标签105指示的正样本的数目和由真值标签105指示的负样本的数目。此外,客户端节点110可以确定基于预测得分集合,确定在全部数据样本102中,正样本的预测得分大于负样本的预测得分的数目。基于这三个数目 可以是计算AUC的值需要的度量参数的值。各个客户端节点110可以确定在各自的数据集合上所统计出的这些度量参数的值。
假设N个客户端节点110处的数据样本102的总数是L,并且其中由真值标签105指示的正样本的数目为m个,负样本为n个。此外,每个数据样本102对应的预测得分为si,i∈[1,L]。通过遍历正样本和负样本的两两组合,可以形成m*n个样本对Pi,i∈[1,m*n个],那么AUC可以被确定如下:

以上讨论了对于AUC的一些示例计算方式。如果适应的话,还可以基于其他方式从概率统计的角度来确定AUC。
在一些实施例中,除AUC之外,还可以评估预测模型125的其他性能指标,只要这样的性能指标是能够从多个预测得分和多个真值标签105中确定出。相应的,客户端节点110可以根据性能指标的类型和计算方式,在本地的预测得分和真值标签中确定性能指标相关的度量参数的值。本公开的实施例在此方面不受限制。
在本公开的实施例中,为了在确定预测模型125的性能指标的同时实现对真值标签的隐私保护,不是直接将客户端节点110确定的多个度量参数的值发送给服务节点120。相反,在信令流200中,在客户端节点110处确定出多个度量参数的值后,客户端节点110对所述多个度量参数的值施加225扰动,得到多个度量参数的扰动值。客户端节点110将多个度量参数的扰动值发送230给服务节点120。
在本公开的实施例中,通过施加扰动,可以避免暴露基于真值标签统计出的度量参数的真实值。下文将详细讨论客户端节点如何施加扰动。在本文中,“扰动”有时也称为噪音、干扰等。
在一些实施例中,期望所施加的扰动能够满足数据的差分隐私的 保护。为更好地理解本公开的实施例,下文将首先简单介绍差分隐私和随机响应机制。
假设∈,δ是大于等于0的实数,即并且是一个随机机制(随机算法)。所谓随机机制,指的是对于特定输入,该机制的输出不是固定值,而是服从某一分布。对于随机机制如果满足以下情况则可以认为随机机制具有(∈,δ)-差分隐私:对于任意两个相邻训练数据集D,D’,并且对于的可能的输出的任意子集δ,存在:
此外,如果δ=0,还可以认为随机机制具有∈-差分隐私(∈-DP)。在差分隐私机制中,对于具有(∈,δ)-差分隐私或∈-差分隐私的随机机制期望其分别作用于两个相邻数据集后得到的两个输出的分布难以区分。这样的话,观察者可以通过观察输出结果,很难察觉到算法的输入数据集中的微小变化,从而达到保护隐私的目的。如果随机机制作用于任何相邻数据集,都得到特定输出S的概率差不多,那么将会认为该算法难以达到差分隐私的效果。
在本公开的实施例中,关注于对数据样本的标签的差分隐私,且标签指示二分类结果。因此,遵循差分隐私的设置,可以定义标签差分隐私。具体地,假设∈,δ是大于等于0的实数,即并且是一个随机机制(随机算法)。如果满足以下情况则可以认为随机机制具有(∈,δ)-标签差分隐私(label differential privacy):对于任意两个相邻训练数据集D,D′,它们的差异仅在于单个数据样本的标签不同,并且对于的可能的输出的任意子集S,存在:
此外,如果δ=0,还可以认为随机机制具有∈-标签差分隐私(∈- 标签DP)。也就是说,期望在改变数据样本的标签后,从随机机制的输出结果的分布仍较小,使得观察者难以察觉到标签的改变。
随机机制可以服从一定的概率分布。在一些实施例中,可以基于高斯(Gaussian)分布或拉普拉斯(Laplace)分布来施加扰动。在一些实施例中,客户端节点110通过确定要扰动的度量参数值的敏感度值,并且基于灵敏度值确定概率分布,用于向各个度量参数的值施加随机扰动。下面首先介绍灵敏度,在介绍如果在灵敏度基础上确定概率分布。
假设d是正整数,D是数据集的合集,并且f:D→Rd是函数,用于从D变化到Rd。函数的灵敏度可以表示为Δf,其可以被定义为Δf=max||f(D1)-f(D2)||1,其表示在D中的数据集D1和D2的所有配对中的最大值,D1和D2相差最多一个数据元素,并且||·||1表示l1范数。根据上述定义可知,灵敏度指的是在变化最多一个数据元素的情况下,函数输出的最大差异是多少。
在不同类型的概率分布中,均可以引入灵敏度值来定义具体的概率分布方式。例如,对于高斯分布机制,对于任意的(∈,δ)∈(0,1),高斯分布具有的标准差可以被定义为其中Δ表示灵敏度值。这样的高斯分布具有(∈,δ)的差分隐私(即,(∈,δ)-DP)。
又例如,对于拉普拉斯分布机制(其以0为中心)并且宽度(scale)为b,其概率密度函数被表示为:如果随机噪声(随机扰动)是由Lap(Δ/∈)的拉普拉斯分布确定的,那么可以认为这样的概率分布能够提供(∈,0)的差分隐私。
基于上述讨论,客户端节点110在施加扰动时,可以确定与不同度量参数的扰动分别相关的灵敏度值,并且基于灵敏度值和差分隐私机制,确定对应的概率分布。客户端节点110可以按照概率分布来向对应的度量参数施加扰动值。
在一些实施例中,对于在客户端节点110-k处确定的多个真值标 签105中第一类标签的数目localPk和第二类标签的数目localNk,客户端节点110-k可以向其中任意一个数值施加扰动即可。因为另一个值可以通过将该节点处的真值标签105的总数减去前一数目的扰动值来确定。
具体地,客户端节点110-k确定与localPk和localNk的扰动相关的灵敏度值。对于标签的数目的值,可以看出,如果随机改变一个真值标签的值,localPk或localNk最多会改变1。因此,此处的灵敏度值可以被确定为Δ=1。基于该灵敏度值,客户端节点110-k可以根据差分隐私机制,确定扰动所要遵循的概率分布。
在一些示例中,可以确定高斯分布,并基于高斯分布来施加扰动(也称为噪音或高斯噪音)。根据以上讨论的高斯分布机制,对于任意的(∈,δ)∈(0,1),高斯分布的标准差可以被确定为 其中灵敏度值Δ=1。这样的高斯分布机制可以满足(∈,δ)的差分隐私(即(∈,δ)-DP)。
在一些示例中,可以确定拉普拉斯分布,并基于拉普拉斯分布来施加扰动(也称为噪音或拉普拉斯噪音)。如果要满足(∈,0)的差分隐私,拉普拉斯分布的宽度可以被确定为b=Δ/∈,即,从Lap(Δ/∈)的分布中施加随机噪音。该分布的标准差为
在一些实施例中,对于在客户端节点110-k处确定的第一类标签对应的数据样本102(即,正样本)的预测得分在预测得分集合中超过的预测得分的第三数目localSumk,客户端节点110-k可以确定该度量参数的扰动的灵敏度值。
在一些实施例中,与localSumk有关的灵敏度值由服务节点120从全局的角度确定。这样的扰动方式可以获得全局隐私保护。从全局角度看,对于第一类标签对应的正样本的预测得分在预测得分集合中超过的预测得分的数目,其敏感度可以被确定为Δ=Q-1,其中Q是表示N个客户端节点110处的数据样本的总数目。这里的敏感度值表示,如果改变数据样本的全集中的一个数据样本,那么localSumk最大会改Q-1。在一些实施例中,对于某个客户端节点110-k,其可以从服务 节点120接收与灵敏度值相关的信息。该信息可以是多个客户端节点的数据样本的总数目Q,也可以直接接收敏感度值Q-1,或者其他能够确定灵敏度值的信息。
在一些实施例中,与localSumk有关的灵敏度值由各个客户端节点110-k在本地确定。这样的扰动方式可以获得局部隐私保护。具体地,客户端节点110-k从本地的多个预测得分各自的排序结果中确定最高排序结果,并基于最高排序结果来确定灵敏度值,其可以被表示为也就是说,对于各个客户端节点处的本地数据集,如果改变其中一个数据样本,那么localSumk最大会改变的数值与其对应的预测得分在整体预测得分集中的最高排序有关。
在客户端节点110-k以任意方式确定出与localSumk有关的灵敏度值后,基于该灵敏度值,客户端节点110-k可以根据差分隐私机制,确定对localSumk施加扰动所要遵循的概率分布。
在一些示例中,可以确定高斯分布,并基于高斯分布来施加扰动(也称为噪音或高斯噪音)。根据以上讨论的高斯分布机制,对于任意的(∈,δ)∈(0,1),高斯分布的标准差可以被确定为 这样的高斯分布机制可以满足(∈,δ)的差分隐私(即(∈,δ)-DP)。
在一些示例中,可以确定拉普拉斯分布,并基于拉普拉斯分布来施加扰动(也称为噪音或拉普拉斯噪音)。如果要满足(∈,0)的差分隐私,拉普拉斯分布的宽度可以被确定为b=Δ/∈,即,从Lap(Δ/∈)的分布中施加随机噪音。该分布的标准差为
以上讨论了对于一些示例度量参数施加扰动的方式。对于其他不同的度量参数,也可以以类似的方式,确定灵敏度值和对应的概率分布,用于相应地施加扰动。这里不再赘述。
通过施加随机扰动,从本地的真值标签确定出的度量参数的值不需要被暴露。客户端节点110可以向服务节点120发送度量参数的扰动值。
继续参考图2,服务节点120从多个客户端节点110接收235到 它们各自提供的多个度量参数的扰动值。服务节点120按度量参数聚合240来自多个客户端节点110的多个度量参数的扰动值,得到多个度量参数的聚合值。服务节点120基于多个度量参数的聚合值来确定245预测模型125的性能指标的值。
在一些实施例中,假设从客户端节点110接收到的度量参数的扰动值为localSum′k,localP′k,localN′k,服务节点120将各个客户端节点110的这些度量参数的值聚合(例如,加和到一起),可以分别得到:


由于是基于扰动值确定的,这些聚合值可能不完全等同于从多个客户端节点110处的真值标签和预测得分统计出的值。然而,由于多个客户端节点110均各自应用了随机扰动,在一些实施例中,客户端节点110应用的随机扰动(例如,概率分布)的均值为0。这样,通过服务节点120处的聚合操作,各个客户端节点110的随机扰动可以被互相抵消,使得这些聚合值近似于这些度量参数的真实值。
在一些实施例中,服务节点120可以根据上式(4)来计算预测模型125的AUC的值。在一些实施例中,对于从客户端节点110获得的其他度量参数的扰动值,服务节点120也可以类似聚合以用于计算性能指标。例如,对于上式(5)给出的AUC的计算方式,服务节点120可以从客户端节点110接收对应的度量参数的扰动值,并聚合后用于计算AUC。
虽然在均值为0的概率分布下,聚合操作可以抵消方差,但可能与性能指标的真实值相比,所确定的值还会存在一定方差。但发明人通过反复试验和验证,发现这样的方差较小,在允许范围内。特别是随着参与的客户端节点的数目增加,方差会更小。
实际上,严格来说,即使拥有真值标签,在计算AUC的很多算 法中,均是通过近似的方式去逼近AUC的真实值,即ROC曲线下方面积。因此,在需要对标签数据进行隐私保护的场景中,根据本公开的各个实施例,能够在获得数据的差分隐私保护的同时,允许服务节点确定出较准确的性能指标。
下文将讨论根据本公开的一些实施例计算的性能指标(以AUC为例)的值与真实值之间的误差。为了方便,暂时假设在计算AUC时,服务节点使用的是基于真实标签统计的值,例如真实的正样本数目和负样本数目。
在基于全局隐私保护来施加扰动时,所计算的AUC的标准差是其中P是正样本数目,N是负样本数目,c是客户端节点数目,σ是所添加的扰动(噪声)的标准差。从上式可以看出,随着客户端数目的减少,所添加的噪声的减少,所计算的AUC的标准差也会减少。在全局隐私保护中应用拉普拉斯机制为例,M为客户端节点的总数目。因此,AUC的方差可以被计算为标准差为假设每个客户端节点上只有一个数据样本,即c=M,那么标准差为该值在一般应用中是较小值。
在基于局部隐私保护来施加扰动时,为方便计算,假设在一种极端情况下,每个客户端节点上只有一个数据样本,即c=M,那么这些客户端节点所使用的灵敏度值分别为[0,1,2,…,M-1],仍以拉普拉斯分布为例,所计算的AUC的方差如下,其在一般应用中也是较小值:
应当理解,虽然以AUC为例进行说明,在一些实施例中,还可以以类似的扰动和交互方式,使服务节点120能够附加地或备选地计算其他性能指标的值。
图4示出根据本公开的一些实施例的在客户端节点处用于模型性能评估的过程400的流程图。过程400可以被实现在客户端节点110处。
在框410,客户端节点110在客户端节点处,将多个数据样本分别应用到预测模型,以得到预测模型输出的多个预测得分。多个预测得分分别指示多个数据样本属于第一类别或第二类别的预测概率。在框420,客户端节点110基于多个数据样本的多个真值标签和多个预测得分,确定与预测模型的预定性能指标相关的多个度量参数的值。在框430,客户端节点110对多个度量参数的值施加扰动,得到多个度量参数的扰动值。在框440,客户端节点110将多个度量参数的扰动值发送给服务节点。
在一些实施例中,确定多个度量参数的值包括:确定多个真值标签中的第一类标签的第一数目,以作为第一度量参数的值,第一类标签指示对应的数据样本属于第一类别;以及确定多个真值标签中的第二类标签的第二数目,以作为第二度量参数的值,第二类标签指示对应的数据样本属于第二类别。
在一些实施例中,对多个度量参数的值施加扰动包括:确定与第一度量参数和第二度量参数中的一个度量参数的扰动相关的第一灵敏度值;基于第一灵敏度值和差分隐私机制,来确定第一概率分布;基于第一概率分布,对第一度量参数和第二度量参数中的一个度量参数的值施加扰动,得到一个度量参数的扰动值;以及基于多个真值标签的总数目和第一度量参数和第二度量参数中的度量参数的扰动值,确定第一度量参数和第二度量参数中的另一度量参数的扰动值。
在一些实施例中,确定多个度量参数的值包括:将多个预测得分发送给服务节点;从服务节点接收多个预测得分各自在预测得分集合中的排序结果,预测得分集合包括由多个客户端节点发送的预测得分,多个客户端节点包括客户端节点;以及基于多个预测得分各自的排序结果,确定第一类标签对应的数据样本的预测得分在预测得分集合中超过的预测得分的第三数目,以作为第三度量参数的值。
在一些实施例中,对多个度量参数的值施加扰动包括:确定与第三度量参数的扰动相关的第二灵敏度值;基于第二灵敏度值和差分隐私机制,来确定第二概率分布;以及基于第二概率分布,对第三度量参数的值施加扰动。
在一些实施例中,确定第二灵敏度值包括:从服务节点接收与第二灵敏度值相关的信息;以及基于所接收的信息来确定第二灵敏度值。
在一些实施例中,与第二灵敏度值相关的信息包括多个客户端节点的数据样本的总数目。
在一些实施例中,确定第二灵敏度值包括:从多个预测得分各自的排序结果中确定最高排序结果;以及基于最高排序结果来确定第二灵敏度值。
在一些实施例中,预定性能度量指标至少包括受试者工作特征曲线(ROC)的曲线下面积(AUC)。
图5示出根据本公开的一些实施例的在服务节点处用于模型性能评估的过程500的流程图。过程500可以被实现在服务节点120处。
在框510,服务节点120从多个客户端节点分别接收与预测模型的预定性能指标相关的多个度量参数的扰动值。在框520,服务节点120按度量参数聚合来自多个客户端节点的多个度量参数的扰动值,得到多个度量参数的聚合值。在框530,服务节点120基于多个度量参数的聚合值来确定预定性能指标的值。
在一些实施例中,对于多个客户端节点中的给定客户端节点,多个度量参数的扰动值指示以下至少一项:给定客户端节点处的多个真值标签中的第一类标签的第一数目,第一类标签指示对应的数据样本属于第一类别;多个真值标签中的第二类标签的第二数目,以作为第二度量参数的值,第二类标签指示对应的数据样本属于第二类别;以及给定客户端节点处的第一类标签对应的数据样本的预测得分在预测得分集合中超过的预测得分的第三数目,预测得分由预测模型基于数据样本确定,并且预测得分集合包括由多个客户端节点发送的预测得分。
在一些实施例中,过程500还包括:向多个客户端节点分别发送与第二灵敏度值相关的信息。
在一些实施例中,与第二灵敏度值相关的信息包括多个客户端节点的数据样本的总数目。
在一些实施例中,预定性能度量指标至少包括受试者工作特征曲线(ROC)的曲线下面积(AUC)。
图6示出了根据本公开的一些实施例的在客户端节点处用于模型性能评估的装置600的框图。装置600可以被实现为或者被包括在客户端节点110中。装置600中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。
如图所示,装置600包括预测模块610,被配置为将多个数据样本分别应用到预测模型,以得到预测模型输出的多个预测得分,多个预测得分分别指示多个数据样本属于第一类别或第二类别的预测概率。装置600还包括度量确定模块620,被配置为基于多个数据样本的多个真值标签和多个预测得分,确定与预测模型的预定性能指标相关的多个度量参数的值。装置600还包括扰动模块630,被配置为对多个度量参数的值施加扰动,得到多个度量参数的扰动值;以及发送模块640,被配置为将多个度量参数的扰动值发送给服务节点。
在一些实施例中,度量确定模块620包括:第一确定模块,被配置为确定多个真值标签中的第一类标签的第一数目,以作为第一度量参数的值,第一类标签指示对应的数据样本属于第一类别;以及第二确定模块,被配置为确定多个真值标签中的第二类标签的第二数目,以作为第二度量参数的值,第二类标签指示对应的数据样本属于第二类别。
在一些实施例中,扰动模块包括:第一灵敏度确定模块,被配置为确定与第一度量参数和第二度量参数中的一个度量参数的扰动相关的第一灵敏度值;第一分布确定模块,被配置为基于第一灵敏度值和差分隐私机制,来确定第一概率分布;第一扰动施加模块,被配置为基于第一概率分布,对第一度量参数和第二度量参数中的一个度量 参数的值施加扰动,得到一个度量参数的扰动值;以及扰动值确定模块,被配置为基于多个真值标签的总数目和第一度量参数和第二度量参数中的度量参数的扰动值,确定第一度量参数和第二度量参数中的另一度量参数的扰动值。
在一些实施例中,度量确定模块包括:得分发送模块,被配置为将多个预测得分发送给服务节点;结果接收模块,被配置为从服务节点接收多个预测得分各自在预测得分集合中的排序结果,预测得分集合包括由多个客户端节点发送的预测得分,多个客户端节点包括客户端节点;以及第三确定模块,被配置为基于多个预测得分各自的排序结果,确定第一类标签对应的数据样本的预测得分在预测得分集合中超过的预测得分的第三数目,以作为第三度量参数的值。
在一些实施例中,扰动模块包括:第二灵敏度确定模块,被配置为确定与第三度量参数的扰动相关的第二灵敏度值;第二分布确定模块,被配置为基于第二灵敏度值和差分隐私机制,来确定第二概率分布;以及第二扰动施加模块,被配置为基于第二概率分布,对第三度量参数的值施加扰动。
在一些实施例中,第二灵敏度确定模块包括:灵敏度接收模块,被配置为从服务节点接收与第二灵敏度值相关的信息;以及基于信息的确定模块,被配置为基于所接收的信息来确定第二灵敏度值。
在一些实施例中,与第二灵敏度值相关的信息包括多个客户端节点的数据样本的总数目。
在一些实施例中,第二灵敏度确定模块包括:排序确定模块,被配置为从多个预测得分各自的排序结果中确定最高排序结果;以及基于排序的确定模块,被配置为基于最高排序结果来确定第二灵敏度值。
在一些实施例中,预定性能度量指标至少包括受试者工作特征曲线(ROC)的曲线下面积(AUC)。
图7示出了根据本公开的一些实施例的在客户端节点处用于模型性能评估的装置700的框图。装置700可以被实现为或者被包括在服务节点120中。装置700中的各个模块/组件可以由硬件、软件、固件 或者它们的任意组合来实现。
如图所示,装置700包括接收模块710,被配置为从多个客户端节点分别接收与预测模型的预定性能指标相关的多个度量参数的扰动值。装置700还包括聚合模块720,被配置为按度量参数聚合来自多个客户端节点的多个度量参数的扰动值,得到多个度量参数的聚合值。装置700还包括性能确定模块730,被配置为基于多个度量参数的聚合值来确定预定性能指标的值。
在一些实施例中,对于多个客户端节点中的给定客户端节点,多个度量参数的扰动值指示以下至少一项:给定客户端节点处的多个真值标签中的第一类标签的第一数目,第一类标签指示对应的数据样本属于第一类别;多个真值标签中的第二类标签的第二数目,以作为第二度量参数的值,第二类标签指示对应的数据样本属于第二类别;以及给定客户端节点处的第一类标签对应的数据样本的预测得分在预测得分集合中超过的预测得分的第三数目,预测得分由预测模型基于数据样本确定,并且预测得分集合包括由多个客户端节点发送的预测得分。
在一些实施例中,装置700还包括:灵敏度发送模块,被配置为向多个客户端节点分别发送与第二灵敏度值相关的信息。
在一些实施例中,与第二灵敏度值相关的信息包括多个客户端节点的数据样本的总数目。
在一些实施例中,预定性能度量指标至少包括受试者工作特征曲线(ROC)的曲线下面积(AUC)。
图8示出了能够实施本公开的一个或多个实施例的计算设备/系统800的框图。应当理解,图8所示出的计算设备/系统800仅仅是示例性的,而不应当构成对本文所描述的实施例的功能和范围的任何限制。图8所示出的计算设备/系统800可以用于实现图1的客户端节点110或服务节点120。
如图8所示,计算设备/系统800是通用计算设备的形式。计算设备/系统800的组件可以包括但不限于一个或多个处理器或处理单元 810、存储器820、存储设备830、一个或多个通信单元840、一个或多个输入设备850以及一个或多个输出设备860。处理单元810可以是实际或虚拟处理器并且能够根据存储器820中存储的程序来执行各种处理。在多处理器系统中,多个处理单元并行执行计算机可执行指令,以提高计算设备/系统800的并行处理能力。
计算设备/系统800通常包括多个计算机存储介质。这样的介质可以是计算设备/系统800可访问的任何可以获得的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器820可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或它们的某种组合。存储设备830可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,诸如闪存驱动、磁盘或者任何其他介质,其可以能够用于存储信息和/或数据(例如用于训练的训练数据)并且可以在计算设备/系统800内被访问。
计算设备/系统800可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图8中示出,可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中,每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器820可以包括计算机程序产品825,其具有一个或多个程序模块,这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。
通信单元840实现通过通信介质与其他计算设备进行通信。附加地,计算设备/系统800的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,计算设备/系统800可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。
输入设备850可以是一个或多个输入设备,例如鼠标、键盘、追踪球等。输出设备860可以是一个或多个输出设备,例如显示器、扬 声器、打印机等。计算设备/系统800还可以根据需要通过通信单元840与一个或多个外部设备(未示出)进行通信,外部设备诸如存储设备、显示设备等,与一个或多个使得用户与计算设备/系统800交互的设备进行通信,或者与使得计算设备/系统800与一个或多个其他计算设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。
根据本公开的示例性实现方式,提供了一种计算机可读存储介质,其上存储有计算机可执行指令或计算机程序,其中计算机可执行指令或计算机程序被处理器执行以实现上文描述的方法。
根据本公开的示例性实现方式,还提供了一种计算机程序产品,计算机程序产品被有形地存储在非瞬态计算机可读介质上并且包括计算机可执行指令,而计算机可执行指令被处理器执行以实现上文描述的方法。
在本文中参照根据本公开实现的方法、装置、设备和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上,使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得 在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实现的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本公开的各实现,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实现的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各个实现方式。

Claims (20)

  1. 一种模型性能评估的方法,包括:
    在客户端节点处,将多个数据样本分别应用到预测模型,以得到所述预测模型输出的多个预测得分,所述多个预测得分分别指示所述多个数据样本属于第一类别或第二类别的预测概率;
    基于所述多个数据样本的多个真值标签和所述多个预测得分,确定与所述预测模型的预定性能指标相关的多个度量参数的值;
    对所述多个度量参数的所述值施加扰动,得到所述多个度量参数的扰动值;以及
    将所述多个度量参数的扰动值发送给服务节点。
  2. 根据权利要求1所述的方法,其中确定所述多个度量参数的所述值包括:
    确定所述多个真值标签中的第一类标签的第一数目,以作为第一度量参数的值,所述第一类标签指示对应的数据样本属于所述第一类别;以及
    确定所述多个真值标签中的第二类标签的第二数目,以作为第二度量参数的值,所述第二类标签指示对应的数据样本属于所述第二类别。
  3. 根据权利要求2所述的方法,其中对所述多个度量参数的所述值施加扰动包括:
    确定与所述第一度量参数和所述第二度量参数中的一个度量参数的扰动相关的第一灵敏度值;
    基于所述第一灵敏度值和差分隐私机制,来确定第一概率分布;
    基于所述第一概率分布,对所述第一度量参数和所述第二度量参数中的所述一个度量参数的所述值施加扰动,得到所述一个度量参数的扰动值;以及
    基于所述多个真值标签的总数目和所述第一度量参数和所述第二度量参数中的所述度量参数的扰动值,确定所述第一度量参数和所 述第二度量参数中的另一度量参数的扰动值。
  4. 根据权利要求1至3中任一项所述的方法,其中确定所述多个度量参数的所述值包括:
    将所述多个预测得分发送给所述服务节点;
    从所述服务节点接收所述多个预测得分各自在预测得分集合中的排序结果,所述预测得分集合包括由多个客户端节点发送的预测得分,所述多个客户端节点包括所述客户端节点;以及
    基于所述多个预测得分各自的所述排序结果,确定所述第一类标签对应的数据样本的预测得分在所述预测得分集合中超过的预测得分的第三数目,以作为第三度量参数的值。
  5. 根据权利要求4所述的方法,其中对所述多个度量参数的所述值施加扰动包括:
    确定与所述第三度量参数的扰动相关的第二灵敏度值;
    基于所述第二灵敏度值和差分隐私机制,来确定第二概率分布;以及
    基于所述第二概率分布,对所述第三度量参数的所述值施加扰动。
  6. 根据权利要求5所述的方法,其中确定所述第二灵敏度值包括:
    从所述服务节点接收与所述第二灵敏度值相关的信息;以及
    基于所接收的信息来确定所述第二灵敏度值。
  7. 根据权利要求6所述的方法,其中与所述第二灵敏度值相关的信息包括所述多个客户端节点的数据样本的总数目。
  8. 根据权利要求5至7中任一项所述的方法,其中确定所述第二灵敏度值包括:
    从所述多个预测得分各自的所述排序结果中确定最高排序结果;以及
    基于所述最高排序结果来确定所述第二灵敏度值。
  9. 根据权利要求1至8中任一项所述的方法,其中所述预定性能度量指标至少包括受试者工作特征曲线(ROC)的曲线下面积 (AUC)。
  10. 一种模型性能评估的方法,包括:
    在服务节点处,从多个客户端节点分别接收与预测模型的预定性能指标相关的多个度量参数的扰动值;
    按度量参数聚合来自所述多个客户端节点的所述多个度量参数的所述扰动值,得到所述多个度量参数的聚合值;以及
    基于所述多个度量参数的所述聚合值来确定所述预定性能指标的值。
  11. 根据权利要求10所述的方法,其中对于所述多个客户端节点中的给定客户端节点,所述多个度量参数的扰动值指示以下至少一项:
    所述给定客户端节点处的多个真值标签中的第一类标签的第一数目,所述第一类标签指示对应的数据样本属于所述第一类别;
    所述多个真值标签中的第二类标签的第二数目,以作为第二度量参数的值,所述第二类标签指示对应的数据样本属于所述第二类别;以及
    所述给定客户端节点处的所述第一类标签对应的数据样本的预测得分在预测得分集合中超过的预测得分的第三数目,所述预测得分由所述预测模型基于数据样本确定,并且所述预测得分集合包括由所述多个客户端节点发送的预测得分。
  12. 根据权利要求11所述的方法,还包括:
    向所述多个客户端节点分别发送与所述第二灵敏度值相关的信息。
  13. 根据权利要求12所述的方法,其中与所述第二灵敏度值相关的信息包括所述多个客户端节点的数据样本的总数目。
  14. 根据权利要求10至13中任一项所述的方法,其中所述预定性能度量指标至少包括受试者工作特征曲线(ROC)的曲线下面积(AUC)。
  15. 一种用于模型性能评估的装置,包括:
    预测模块,被配置为将多个数据样本分别应用到预测模型,以得到所述预测模型输出的多个预测得分,所述多个预测得分分别指示所述多个数据样本属于第一类别或第二类别的预测概率;
    度量确定模块,被配置为基于所述多个数据样本的多个真值标签和所述多个预测得分,确定与所述预测模型的预定性能指标相关的多个度量参数的值;
    扰动模块,被配置为对所述多个度量参数的所述值施加扰动,得到所述多个度量参数的扰动值;以及
    发送模块,被配置为将所述多个度量参数的扰动值发送给服务节点。
  16. 一种用于模型性能评估的装置,包括:
    接收模块,被配置为从多个客户端节点分别接收与预测模型的预定性能指标相关的多个度量参数的扰动值;
    聚合模块,被配置为按度量参数聚合来自所述多个客户端节点的所述多个度量参数的所述扰动值,得到所述多个度量参数的聚合值;以及
    性能确定模块,被配置为基于所述多个度量参数的所述聚合值来确定所述预定性能指标的值。
  17. 一种电子设备,包括:
    至少一个处理单元;以及
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令在由所述至少一个处理单元执行时使所述设备执行根据权利要求1至9中任一项所述的方法。
  18. 一种电子设备,包括:
    至少一个处理单元;以及
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令在由所述至少一个处理单元执行时使所述设备执行根据权利要求10 至14中任一项所述的方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以实现根据权利要求1至9中任一项所述的方法。
  20. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以实现根据权利要求10至14中任一项所述的方法。
PCT/CN2023/091189 2022-05-13 2023-04-27 用于模型性能评估的方法、装置、设备和介质 WO2023216902A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210524865.2 2022-05-13
CN202210524865.2A CN117112186A (zh) 2022-05-13 2022-05-13 用于模型性能评估的方法、装置、设备和介质

Publications (1)

Publication Number Publication Date
WO2023216902A1 true WO2023216902A1 (zh) 2023-11-16

Family

ID=88729637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091189 WO2023216902A1 (zh) 2022-05-13 2023-04-27 用于模型性能评估的方法、装置、设备和介质

Country Status (2)

Country Link
CN (1) CN117112186A (zh)
WO (1) WO2023216902A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861099A (zh) * 2020-06-02 2020-10-30 光之树(北京)科技有限公司 联邦学习模型的模型评估方法及装置
CN113094758A (zh) * 2021-06-08 2021-07-09 华中科技大学 一种基于梯度扰动的联邦学习数据隐私保护方法及系统
CN113221183A (zh) * 2021-06-11 2021-08-06 支付宝(杭州)信息技术有限公司 实现隐私保护的多方协同更新模型的方法、装置及系统
CN114239860A (zh) * 2021-12-07 2022-03-25 支付宝(杭州)信息技术有限公司 基于隐私保护的模型训练方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861099A (zh) * 2020-06-02 2020-10-30 光之树(北京)科技有限公司 联邦学习模型的模型评估方法及装置
CN113094758A (zh) * 2021-06-08 2021-07-09 华中科技大学 一种基于梯度扰动的联邦学习数据隐私保护方法及系统
CN113221183A (zh) * 2021-06-11 2021-08-06 支付宝(杭州)信息技术有限公司 实现隐私保护的多方协同更新模型的方法、装置及系统
CN114239860A (zh) * 2021-12-07 2022-03-25 支付宝(杭州)信息技术有限公司 基于隐私保护的模型训练方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANKAI SUN; XIN YANG; YUANSHUN YAO; JUNYUAN XIE; DI WU; CHONG WANG: "Differentially Private AUC Computation in Vertical Federated Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 24 May 2022 (2022-05-24), 201 Olin Library Cornell University Ithaca, NY 14853, XP091232108 *
JIANKAI SUN; XIN YANG; YUANSHUN YAO; JUNYUAN XIE; DI WU; CHONG WANG: "DPAUC: Differentially Private AUC Computation in Federated Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 December 2022 (2022-12-07), 201 Olin Library Cornell University Ithaca, NY 14853, XP091389252 *

Also Published As

Publication number Publication date
CN117112186A (zh) 2023-11-24

Similar Documents

Publication Publication Date Title
US10095774B1 (en) Cluster evaluation in unsupervised learning of continuous data
US11893493B2 (en) Clustering techniques for machine learning models
TWI658420B (zh) 融合時間因素之協同過濾方法、裝置、伺服器及電腦可讀存儲介質
US11556567B2 (en) Generating and visualizing bias scores representing bias in digital segments within segment-generation-user interfaces
US20230297847A1 (en) Machine-learning techniques for factor-level monotonic neural networks
US11687804B2 (en) Latent feature dimensionality bounds for robust machine learning on high dimensional datasets
WO2024051052A1 (zh) 组学数据的批次矫正方法、装置、存储介质及电子设备
US20210150335A1 (en) Predictive model performance evaluation
Wang et al. A regularized convex nonnegative matrix factorization model for signed network analysis
WO2024022082A1 (zh) 信息分类的方法、装置、设备和介质
WO2023216902A1 (zh) 用于模型性能评估的方法、装置、设备和介质
WO2023216899A1 (zh) 用于模型性能评估的方法、装置、设备和介质
WO2023216900A1 (zh) 用于模型性能评估的方法、装置、设备和存储介质
Jiang et al. Learning the truth from only one side of the story
Song et al. Collusion detection and ground truth inference in crowdsourcing for labeling tasks
CN115511104A (zh) 用于训练对比学习模型的方法、装置、设备和介质
Roh et al. A bi-level nonlinear eigenvector algorithm for wasserstein discriminant analysis
CN113159100B (zh) 电路故障诊断方法、装置、电子设备和存储介质
US20240078829A1 (en) Systems and methods for identifying specific document types from groups of documents using optical character recognition
WO2024136905A1 (en) Clustering techniques for machine learning models
US20240202816A1 (en) Systems and methods for dynamically generating pre-approval data
Bohlouli et al. Scalable multi-criteria decision-making: A mapreduce deployed big data approach for skill analytics
Zhao Integrating Machine Learning and Optimization for Problems in Contextual Decision-Making and Dynamic Learning
CN117171791A (zh) 一种图像分类模型的隐私泄露风险评估方法及系统
Chowdhury et al. FedSat: A Statistical Aggregation Approach for Class Imbalaced Clients in Federated Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802674

Country of ref document: EP

Kind code of ref document: A1