WO2020015480A1 - 检测数据模型安全性的方法及装置 - Google Patents

检测数据模型安全性的方法及装置 Download PDF

Info

Publication number
WO2020015480A1
WO2020015480A1 PCT/CN2019/090963 CN2019090963W WO2020015480A1 WO 2020015480 A1 WO2020015480 A1 WO 2020015480A1 CN 2019090963 W CN2019090963 W CN 2019090963W WO 2020015480 A1 WO2020015480 A1 WO 2020015480A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
parameter
difference
security
Prior art date
Application number
PCT/CN2019/090963
Other languages
English (en)
French (fr)
Inventor
王华忠
李漓春
殷山
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020015480A1 publication Critical patent/WO2020015480A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • One or more embodiments of the present specification relate to the field of data security, and in particular, to a method and an apparatus for detecting the security of a data model.
  • a common solution for data cooperation is:
  • the data demander deploys its trained machine learning data model to the data provider.
  • the system obtains the raw data of the data provider in real time, and then calculates the model to get the model result and returns it to the data demander.
  • a secure model is deployed, the data demander cannot infer all or part of the input result of the model from the output of the model, and the data provider will not leak the original data.
  • the data demander specially constructs the model, it is possible to obtain part of the original data based on the model results.
  • this model is an insecure model.
  • the security deployment of the model is an important link to improve platform security and enhance mutual trust between the two parties in data cooperation.
  • One or more embodiments of the present specification describe a method and apparatus for detecting the security of a data model based on statistical information about differences in model parameters in the data model before deployment of the data model.
  • the output results are limited and adjusted to reduce the security risks of the data model.
  • a method for detecting the security of a data model which is provided by a data demander for deployment to a data provider for performing model operations on the source data of the data provider; the method include:
  • the difference statistical information includes a first statistic related to a difference in parameter value size, and / or a second statistic related to a difference in the number of parameter bits;
  • the above method is performed by a data demander.
  • the data demander determines the difference statistical information as security evaluation information, and provides the security evaluation information to the data provider.
  • the data demander determines the security evaluation information according to the difference statistical information and a predetermined difference threshold; and provides the security evaluation information to the data provider.
  • the above method is performed by a data provider.
  • the data provider receives the plurality of model parameters from the data demander.
  • the data provider determines the security evaluation information according to the difference statistical information and a predetermined difference threshold. Further, whether to accept the deployment of the data model may also be determined according to the security evaluation information.
  • the security evaluation information is determined in the following manner: According to multiple difference thresholds preset for a certain difference statistic, the certain difference statistic is divided into different ranges, and the different ranges correspond to The security evaluation information is used at different security levels.
  • the difference statistical information includes multiple statistics.
  • the security assessment information is determined in the following manner:
  • a total safety score is determined as the safety evaluation information.
  • the first statistic includes at least one of the following: the ratio of the maximum parameter to the minimum parameter, the ratio of the difference between the maximum parameter and the minimum parameter to the maximum parameter, and the difference between the maximum parameter and the minimum parameter Relative to the ratio of the smallest parameter, the largest parameter to the mean of the parameter.
  • the above-mentioned first statistic includes at least one of the following: the variance of the parameters; the number of combinations in which the parameter value ratio is higher than a preset ratio threshold in the pairwise combination of the plurality of model parameters The number of combinations where the difference between the parameter values is higher than a preset difference threshold.
  • the second statistic includes at least one of the following: the difference between the maximum and minimum values of the number of decimal places for each parameter, the number of consecutive valid zeros in the decimal part of each parameter, and the decimal part of each parameter The maximum number of consecutive valid zeros in.
  • the data model includes a logistic regression model, a decision tree model, a gradient boosted decision tree GBDT model, and a score card model.
  • a method for reducing a security risk of a data model which is provided by a data demander for deployment to a data provider for performing model operations on source data of the data provider; the method includes :
  • the type of the result is a continuous value, using a predetermined number of bits to represent the continuous value;
  • the discrete classification probability is converted into a classification decision result.
  • the predetermined number of bits is set in advance based on a range of a predetermined output result.
  • the continuous value is a decimal number
  • the predetermined number of bits is used to represent the continuous value.
  • the predetermined number of digits is retained for the continuous value, and the predetermined number of bits is based on the bit of the model parameter of the data model The number is set in advance.
  • the discrete classification probability is converted into a classification decision result in the following manner: a classification boundary of the classification decision is obtained, and the discrete classification probability is converted into a classification decision by comparing the discrete classification probability with the classification boundary result.
  • a device for detecting the security of a data model is provided, the data model is provided by a data demander for deployment to the data provider, and is used to perform model operations on the source data of the data provider; the device includes :
  • An obtaining unit configured to obtain a plurality of model parameters included in the data model
  • a statistical determination unit configured to determine difference statistical information of the plurality of model parameters, the difference statistical information includes a first statistic related to a difference in parameter value size, and / or a second statistic related to a difference in parameter digits the amount;
  • a security determining unit is configured to determine security evaluation information of the data model according to the difference statistical information.
  • a device for reducing a security risk of a data model which is provided by a data demander for deployment to the data provider for performing model operations on source data of the data provider; the device includes :
  • a type determining unit configured to determine a result type of an output result of the data model, where the result type includes at least a continuous value and a discrete classification probability;
  • a continuous value processing unit configured to use a predetermined number of bits to represent the continuous value when the result type is a continuous value
  • the discrete result processing unit is configured to convert the discrete classification probability into a classification decision result if the result type is a discrete classification probability.
  • a computer-readable storage medium having stored thereon a computer program, which when executed in a computer, causes the computer to execute the methods of the first aspect and the second aspect.
  • a computing device including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the first aspect and the first aspect are implemented Two ways.
  • the security evaluation information of the data model is determined based on the statistical information of the differences in the model parameters in the data model, thereby detecting the security of the data model. Furthermore, in the model prediction stage, the output results of the data model can also be limited and adjusted to reduce the amount of information in the output results, thereby reducing the security risk of the data model.
  • FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification
  • FIG. 2 illustrates a method for detecting the security of a data model according to an embodiment
  • FIG. 3A illustrates an implementation manner of a detection method in an embodiment
  • FIG. 3B illustrates an implementation manner of a detection method in another embodiment
  • FIG. 3C illustrates an implementation manner of a detection method in still another embodiment
  • FIG. 4 shows a flowchart of a method for reducing a security risk of a data model according to an embodiment
  • FIG. 5 shows a schematic block diagram of a model security detection device according to an embodiment
  • FIG. 6 shows a schematic block diagram of a device for reducing security risks according to an embodiment.
  • FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification.
  • the data demander cooperates with the data provider to complete data processing and analysis.
  • the data provider has the source data to be analyzed, but there may not be applicable data analysis tools.
  • the data demander constructs and trains the data model according to the needs of data analysis, and then deploys the trained data model to the data provider. In this sense, the data demander can also be called the model provider.
  • the model provider deploys the data model to the data provider, the data model can run on the data provider's platform, obtain the source data of the data provider, analyze, process, and calculate the source data, and then return the result of the operation to the data Demand side, that is, model provider.
  • the data provider is a bank or a financial institution, and they have a large amount of user information as the source data.
  • user information includes, for example, user private information such as user age, income, and address.
  • Banks or financial institutions hope to evaluate the user's credit risk based on these user information, but for the protection of user privacy, they cannot directly provide these data to other institutions. Therefore, you can choose to cooperate with the data demander.
  • the data demander (that is, the model provider) is, for example, an electronic financial platform, such as Alipay or Ant Wealth Platform. Out of business needs, these platforms hope to obtain users' credit risk data. Therefore, the electronic financial platform as a data demander can train some credit evaluation models, deploy them to banks or financial institutions, process and analyze user information, and obtain user credit risk evaluation results.
  • the security of the data model itself is also detected and evaluated, and certain measures are taken to reduce the data model. Security risks.
  • the security of the data model is first checked before the data model is deployed.
  • the detection of safety can be performed based on the statistics of differences in model parameters in the data model.
  • the data provider can decide whether to accept the deployment of the data model or require the model provider to modify the model parameters based on the results of the security test.
  • the output results of the data model can also be adjusted to reduce the amount of information in the output results and further reduce the risk of stealing the source data of the data provider based on the output results.
  • FIG. 2 illustrates a method of detecting the security of a data model according to one embodiment.
  • the data model is provided by the data demander to be deployed to the data provider, and is used to perform model operations on the source data of the data provider.
  • the detection method includes: Step 21, obtaining a plurality of model parameters included in the data model; Step 23, determining difference statistical information of the plurality of model parameters, where the difference statistical information includes a value with the parameter The first statistic related to the difference in size, and / or the second statistic related to the difference in the number of bits of the parameter; step 25, determining the security evaluation information of the data model according to the difference statistical information.
  • the method shown in FIG. 2 can be executed by a data provider, or by a data demander, or by a data provider and a data demander.
  • FIG. 3A to FIG. 3C respectively show the execution manners of the above detection methods in different embodiments.
  • the detection of the security of the data model is mainly performed by the data provider.
  • the model provider / data demander sends the model parameters in the trained model to the data provider.
  • the data provider performs step 21, it receives the model parameters provided by the model provider.
  • the difference statistical information of the model parameters is determined; in step 25, the difference statistical information is analyzed to obtain model safety evaluation information.
  • the data provider can decide whether to accept the deployment of the data model according to the security assessment information, and return a message to the model provider whether to accept the deployment of the data model.
  • the detection of the security of the data model is mainly performed by the data demander, that is, the model provider.
  • the model provider / data demander obtains model parameters of the trained data model in step 21.
  • the difference statistical information of the model parameters is determined; in step 25, the difference statistical information is analyzed to obtain model safety evaluation information.
  • the model provider can send model security assessment information to the data provider, so that the data provider decides whether to accept the deployment of the data model according to the security assessment information, and returns a message to the model provider whether to accept the deployment.
  • the detection of the security of the data model is performed cooperatively by the model provider and the data provider.
  • the model provider obtains model parameters of the trained data model in step 21.
  • the difference statistical information of the model parameters is determined.
  • the model provider sends the difference statistical information to the data provider, and the data provider further analyzes the difference statistical information to judge the security of the model.
  • the model provider determines the difference statistical information as preliminary security evaluation information in step 25, and then sends the preliminary security evaluation information to the data provider.
  • the data provider further analyzes and processes the preliminary security assessment information, and obtains complete security assessment information. Therefore, the data provider can decide whether to accept the deployment of the data model based on the perfect security assessment information, and return a message to the model provider whether to accept the deployment.
  • model parameters included in the data model are obtained.
  • the data model here is a data model constructed and trained by the model provider, including a logistic regression model, a decision tree model, a score card model, a gradient boosted decision tree GBDT model, and so on.
  • Model parameters can be various parameters used in the calculation of the model, such as weight coefficients. For more complicated neural network models, the model parameters corresponding to the same hidden layer can be selected for analysis.
  • difference statistical information of the plurality of model parameters is determined.
  • the difference statistical information may include a first statistic related to the difference in the value of the parameter value, and / or a second statistic related to the difference in the number of parameters.
  • the following describes the difference statistical information, such as the first statistic and the second statistic, on the security of the model.
  • the model provider cannot infer all or part of the model's input from the output of the model, so the source data will not be leaked.
  • the data model is a logistic regression model. More simply, the following linear regression function is used:
  • a1 to a5 are input data, and x1 to x5 are model parameters.
  • the variables are first binned, and then one-hot encoded conversion is performed.
  • the value of the operation variable is 0 or 1. That is, the above a1 to a5 are processed input data corresponding to the source data, and the values are 0 or 1.
  • the source data processing process is also performed by the data model, so the model provider can know the meaning of these input variables.
  • the input variable data a1 is the result of binning and encoding the continuous variable "user age", indicating whether the age is greater than 30 years old, when 0 is taken, it is less than 30 years old, and when 1 is taken, it is greater than 30 year old.
  • the input variable data a2 and a3 can be the result of binning and coding the continuous variable "user income", where a2 indicates whether the income exceeds 10,000 yuan, a3 indicates whether the income exceeds 30,000 yuan, and so on. Therefore, when both a2 and a3 take 0, it means that the user's income is less than 10,000 yuan; when a2 takes 1, and a3 takes 0, it means that the user's income is between 10,000 and 30,000 yuan; when both a2 and a3 take 1, it means that User income is higher than 30,000 yuan.
  • the difference setting can be reflected as the difference in parameter value size.
  • such a differentiated setting may provide clues to the inferred source data.
  • the values of x1, x2, ... x5 are 0.9, 0.12, 0.153, 0.03, 0.09, respectively, where the value of x1 is set to be much larger than other parameters.
  • x1 is a very sensitive field.
  • the input data a1 to a5 in formula (1) each take a value of 0 or 1, by the size of the final result, at least the value of the input parameter a1 corresponding to x1 can be judged. If the result Y is greater than 0.9, it means that the value of a1 is 1, otherwise it is 0, (because even if a2 to a5 all take 1, the sum of x2 to x5 is far less than 0.9). Therefore, through the output result, the value of the input variable a1 is deduced, and then the original user information is obtained, such as whether the user represented by a1 is over 30 years old.
  • one or more of the following statistics can be obtained as the first statistic: among multiple model parameters, the ratio of the largest parameter to the smallest parameter, and the difference between the largest parameter and the smallest parameter relative to the largest parameter. Proportion, the ratio of the difference between the largest parameter and the smallest parameter to the smallest parameter, the proportion of the largest parameter to the mean of the parameter, etc.
  • These statistics can reflect whether there are parameters with abnormal values, especially abnormal parameters with values much larger than other parameters, so as to provide a reference for the model's security evaluation.
  • such a differential setting can also be used to infer the source data.
  • the values of x1, x2, ... x5 are 0.9, 0.12, 0.303, 0.03, 0.034, respectively. It can be seen that in this example, among the five parameters, the value of x1 is close to 1, and x2 and x3 are of the same order, but there is a gap of 3 times, while x4 and x5 are one order of magnitude smaller than x1 to x3. With such a large parameter setting, it is possible to infer the value of the input variable from the result. For example, the following results can be derived:
  • the value of the input variables a1, a2, and a3 can be deduced from the range of the output result Y, and the original user information can be obtained. Infer the range of user income.
  • one or more of the following statistics can be obtained as the first statistic: the variance of the parameters; the number of combinations where the parameter value ratio is higher than the preset ratio threshold in a pairwise combination of multiple model parameters , The number of combinations where the difference between the parameter values is higher than a preset difference threshold, and so on.
  • the preset ratio threshold is 10
  • the number of combinations with a parameter value ratio higher than the preset ratio threshold (10) is 3, that is, x1x4, x1x5 , x3x4 these three combinations.
  • the parameter difference setting can also be reflected as the parameter digit difference.
  • the setting of the number of decimal places can also serve as a special mark. May provide clues to back-source data.
  • the values of x1, x2, ... x5 are 0.310000, 0.101000, 0.800100, 0.300010, 0.500001, respectively. It can be seen that the decimal significant digits (that is, excluding the trailing 0) of these 5 parameters are 2 digits, 3 digits, 4 digits, 5 digits, and 6 digits, respectively. In this way, the value of at least part of the input variable can be deduced by the significant number of decimal places of the result. For example, if the number of significant digits of the result Y is 5 digits, then at least it can be inferred that x4 participates in the operation and a4 takes the value 1.
  • each parameter is actually marked with a zero in the middle and a 1 in the end.
  • the first digit after the decimal point of each parameter is the value digit.From the two digits after the decimal point, it actually acts as the marker digit.
  • the second digit, the third digit, the fourth digit to the sixth digit after the decimal point are marked as 1, and the others.
  • the number of digits is padded with 0. In this way, the value of the input variable can be inferred from the part starting with two digits after the decimal point of the output Y. Which one of the parts is 1, the corresponding input variable value is 1. For example, if the decimal part of the output Y is .801001, then it can be inferred that x2 and x5 participate in the operation. Accordingly, a2 and a5 take the value 1, and other variables take the value 0.
  • one or more of the following statistics can be obtained as the second statistics: the difference between the maximum and minimum values of the decimal places of each parameter, the number of consecutive valid zeros in the decimal part of each parameter, The maximum value of consecutive valid zeros in the decimal part of each parameter, and so on.
  • These statistics can reflect whether there are parameters with abnormal decimal places.
  • the difference between the maximum and minimum values of the decimal places can reflect the abnormal length of the decimal places.
  • the number of consecutive valid zeros in the decimal part that is, the middle The number of consecutive 0's included
  • the second statistic related to the difference in the number of parameters can also be used as the basis for the model's security assessment.
  • the input variable is binned and the encoded value is taken as a discrete value of 0 or 1 for illustration, this concept is also applicable to the case where the input variable is a continuous variable.
  • the above formula (1) is still taken as an example.
  • the input variable a1 represents user income and is a continuous variable with a value ranging from 0 to 100,000.
  • the value of a1 is between 2000 and 50,000.
  • the final result Y is actually about equal to the size of a1, which can at least reflect the approximate range of a1. In this way, you can still know the value or range of some source data by setting the difference in the value of the model parameters. Therefore, in such cases, the above statistical information can also be used to measure the model security risk.
  • the safety evaluation information of the data model is determined according to the difference statistical information.
  • the difference statistics can be directly used as simple security assessment information.
  • the difference statistical information includes a ratio of a maximum parameter to a minimum parameter in the first statistic, and the ratio can be used as security evaluation information. The larger the ratio, the lower the security, and the smaller the ratio, the higher the security.
  • the safety evaluation information is determined according to the difference statistical information and a predetermined difference threshold.
  • different difference thresholds can be set for different difference statistics, for example, for the statistics of the ratio of the size of the values, the ratio threshold is set; for the statistics of the difference in digits, the difference threshold is set.
  • multiple difference thresholds can be set to divide the difference statistic into different ranges, and these different ranges correspond to different security levels.
  • S1 the ratio of the maximum value to the minimum value of the parameter
  • a first threshold value 10 and a second threshold value 100 can be set.
  • S1 is lower than the first threshold value 10
  • S1 is greater than the first threshold value.
  • 10 is less than the second threshold 100
  • the security level is medium security
  • S1 is greater than the second threshold 100
  • the security level is low security.
  • the difference statistical information includes multiple statistics
  • a certain weight can also be given to each statistic; when determining the security evaluation information, first, according to the comparison between each statistic and the corresponding difference threshold, it can be determined with the statistic The relevant safety scores are then used to determine the total safety score as the safety assessment information based on the weight of each statistic.
  • the difference statistical information includes at least S1, S2, and S3, where the statistic S1 is the ratio of the maximum value to the minimum value of the parameter, and the calculation of the safety score Q1 related to S1 is, for example, the ratio is lower than the first Threshold 10, safety score is 10; greater than the first threshold 10 is less than the second threshold 100, the safety score is 5; greater than the second threshold 100, the safety score is 1.
  • the statistic S2 is the number of combinations where the parameter value ratio is higher than the preset ratio threshold, and the corresponding safety score Q2 can be determined based on S2 (the specific process can be set according to needs, and no detailed examples are given).
  • the safety evaluation information is determined based on the statistical information of the difference through various methods.
  • Such security assessment information can be used by the data provider to evaluate the security of the data model, and then decide whether to accept the deployment of the data model, or whether to require the model provider to modify the model. In this way, before the model is deployed, the security of the data model is evaluated by checking the security of the model to improve the security of model calculation in data cooperation.
  • FIG. 4 shows a flowchart of a method for reducing the security risk of a data model according to an embodiment, wherein the data model is provided by a data demander to be deployed to the data provider for performing model operations on the source data of the data provider.
  • the method includes: Step 41, determining a result type of an output result of the data model, the result type including at least continuous values and discrete classification probabilities; step 43, in a case where the result type is continuous values, The predetermined number of bits is used to represent the continuous value; step 45, when the result type is a discrete classification probability, the discrete classification probability is converted into a classification decision result.
  • the method of FIG. 4 may be performed by a data provider. That is, after the data provider accepts the deployment of the data model, in order to further reduce security risks, the data provider can add a computing component to execute the method of FIG. 4. Through this method, the output result of the data model is intercepted, the output result is limited and adjusted, and then the output result after the limitation and adjustment is returned to the model provider.
  • the method of FIG. 4 may be performed by a model provider. That is, the model provider can respond to the requirements of the data provider, in order to further reduce security risks, add a computing component on the basis of the original data model to perform the method of Figure 4.
  • the computing component can be attached to the original data model as part of the optimized data model and deployed to the data provider along with the original data model. With this method, the model provider only obtains the restricted and adjusted output results, thereby reducing the security risk of the data provider.
  • result types can include, continuous numerical results, and discrete results.
  • the continuous numerical result is, for example, the use of a logistic regression model or a score card model to score the user's credit value based on user behavior data.
  • the output result Y in Formula 1 may be a continuous numerical result.
  • Discrete results include, for example, classification decision results.
  • a decision tree model is used to classify them into a type of picture, that is, a picture that contains the target object, or a second type of picture, that is, a picture that does not contain the target object.
  • Discrete results can also include discrete classification probabilities, such as the probability of classifying a picture into a class of pictures and the probability of classifying a picture into a class of pictures.
  • different processing methods are performed below.
  • step 43 when the result type is a continuous value, the continuous value is represented by a predetermined number of bits, the purpose of which is to use as few bits as possible to represent the value of the output result, thereby Avoid stealing source data information by providing additional tags with redundant bits.
  • the predetermined number of bits can be set in advance based on a range of a predetermined output result.
  • the model provider can agree with the data provider, and the output of the model is a score between 0-100.
  • 6 bits can be used to represent the output result, because 6 bits are sufficient to represent the maximum 128 output value.
  • the conventional floating-point number definition 64-bit
  • the output result is a decimal.
  • using a predetermined number of bits to indicate that the output result includes only retaining the predetermined number of decimal places.
  • the predetermined number of bits can be set in advance based on the number of bits setting of the model parameter. For example, in the previous example, the values of x1, x2, ... x5 are 0.310000, 0.101000, 0.800100, 0.300010, and 0.500001, respectively, and two digits after the decimal point are actually used as marker bits.
  • the output result can be set to retain only 2 decimal places, so as to avoid the effect of the mark bit on the source data while retaining the true value bit.
  • the result of the decimal is integerized, so that the output result is still represented by a predetermined number of bits.
  • step 45 if the result type is a discrete classification probability, the discrete classification probability is converted into a classification decision result.
  • the classification boundary of the classification decision is obtained.
  • the classification boundary can be set by the model in advance or specified at this step. By comparing the classification probability with the classification boundary, the discrete classification probability can be converted into the classification decision result.
  • the discrete classification probability includes that the probability of belonging to a class of pictures is 65%, the probability of belonging to a class of pictures is 35%, and the classification boundary is 50%, then the discrete classification probability can be directly converted into the classification decision result : A class of pictures.
  • FIG. 5 shows a schematic block diagram of a model security detection device according to one embodiment, which is used to detect the security of a data model provided by a data demander for deployment to a data provider for providing data
  • the square source data is used for model calculation.
  • the detection device 500 includes: an obtaining unit 51 configured to obtain a plurality of model parameters included in the data model; and a statistics determining unit 53 configured to determine difference statistical information of the plurality of model parameters.
  • the difference statistical information includes a first statistic related to a difference in parameter value size, and / or a second statistic related to a difference in parameter digits; the security determination unit 55 is configured to determine the difference according to the difference statistical information.
  • Information on the security assessment of the data model is configured to obtain a plurality of model parameters included in the data model.
  • a statistics determining unit 53 configured to determine difference statistical information of the plurality of model parameters.
  • the difference statistical information includes a first statistic related to a difference in parameter value size, and / or a second statistic related to a
  • the device 500 is provided on the data demand side.
  • the security determination unit 55 may be configured to determine the difference statistical information as the security evaluation information.
  • the apparatus 500 may further include a providing unit (not shown) configured to provide the security evaluation information to the data provider.
  • the security determining unit 55 may be further configured to determine security evaluation information according to the difference statistical information and a predetermined difference threshold.
  • the providing unit is configured to provide such security evaluation information to the data provider.
  • the apparatus 500 is provided on the data provider side.
  • the obtaining unit 51 is configured to receive the plurality of model parameters from a data demander.
  • the security determination unit 55 is configured to determine security evaluation information according to the difference statistical information and a predetermined difference threshold.
  • the apparatus 500 further includes a deployment determination unit (not shown) configured to determine whether to accept deployment of the data model according to the security evaluation information.
  • the security determination unit 55 may be configured to divide the certain difference statistic into different ranges according to a plurality of difference thresholds preset for a certain difference statistic.
  • the different ranges correspond to different security levels as the security evaluation information.
  • the difference statistical information includes a plurality of statistics
  • the security determining unit 55 may be configured to: for each comparison of the plurality of statistics and a corresponding difference threshold, determine the correlation with the respective statistics Based on the security scores related to each statistic and the weights preset for each statistic, determine the total security score as the security evaluation information.
  • the first statistic includes at least one of the following: the ratio of the maximum parameter to the minimum parameter, the ratio of the difference between the maximum parameter and the minimum parameter to the maximum parameter, and the difference between the maximum parameter and the minimum parameter.
  • the first statistic includes at least one of the following: the variance of the parameters; the number of combinations in which the parameter value ratio is higher than a preset ratio threshold in the pairwise combination of the plurality of model parameters, and the parameter The number of combinations whose difference is higher than a preset difference threshold.
  • the second statistic includes at least one of the following: the difference between the maximum value and the minimum value of the number of decimal places for each parameter, the number of consecutive valid zeros in the decimal part of each parameter, and the decimal part of each parameter The maximum number of consecutive valid zeros in.
  • the data model includes a logistic regression model, a decision tree model, a gradient boosted decision tree GBDT model, and a score card model.
  • FIG. 6 illustrates a security risk reduction device for reducing the security risk of a data model provided by a data demander for deployment to a data provider for data source's source according to one embodiment The data is modeled. As shown in FIG.
  • the apparatus 600 for reducing security risks includes a type determining unit 61 configured to determine a result type of an output result of the data model, where the result type includes at least continuous values and discrete classification probabilities; continuous values
  • the processing unit 63 is configured to use a predetermined number of bits to represent the continuous value in a case where the result type is a continuous value
  • the discrete result processing unit 65 is configured to, in a case where the result type is a discrete classification probability, The discrete classification probability is converted into a classification decision result.
  • the predetermined number of bits is set in advance based on a range of a predetermined output result.
  • the continuous value processing unit 63 is configured to retain a predetermined number of decimal places for the continuous value, the predetermined number of bits being based on a bit of a model parameter of the data model The number is set in advance.
  • the discrete result processing unit 65 is configured to obtain a classification boundary of a classification decision, and convert the discrete classification probability into a classification decision result by comparing the discrete classification probability with the classification boundary.
  • the security evaluation information is determined based on the difference statistical information.
  • Such security assessment information can be used by the data provider to evaluate the security of the data model, and then decide whether to accept the deployment of the data model, or whether to require the model provider to modify the model.
  • the security of the data model is evaluated by checking the security of the model to improve the security of model calculation in data cooperation.
  • the model is running predictions, by limiting and adjusting the output results, the amount of information of the output results returned to the model provider is minimized, the difficulty of back-deriving the source data is increased, and the security risk of the data model is reduced.
  • a computer-readable storage medium having stored thereon a computer program, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2 and FIG. 4.
  • a computing device which includes a memory and a processor.
  • the memory stores executable code.
  • the processor executes the executable code, the combination of FIG. 2 and FIG. 4 is implemented. The method described.
  • the functions described in the present invention may be implemented by hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本说明书实施例提供一种检测数据模型的安全性以及降低其安全风险的方法和装置,其中数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算。在上述中,首先获取数据模型中包含的多个模型参数,然后确定多个模型参数的差异统计信息,包括与参数取值大小差异相关的统计量,和/或与参数位数差异相关的统计量。接着,根据差异统计信息,确定数据模型的安全性评估信息。进一步地,还可以对数据模型的输出结果进行限制处理,以降低其输出结果的信息量,进一步降低安全性风险。

Description

检测数据模型安全性的方法及装置 技术领域
本说明书一个或多个实施例涉及数据安全领域,尤其涉及检测数据模型的安全性的方法和装置。
背景技术
大数据时代,存在非常多的数据孤岛。每个自然人的数据分散存于不同的企业中,企业与企业之间由于竞争关系和用户隐私保护的考虑,并不是完全的互相信任。企业之间进行数据合作的重要原则是原始数据不出边界,把计算移到数据端完成。多方安全计算平台就是为了解决不同企业数据合作过程中数据隐私保护问题而开发设计。
数据合作的一种常用的方案是:在数据合作过程中,数据需求方把自己训练好的的机器学习数据模型部署到数据提供方。模型预测时,系统实时获取数据提供方的原始数据,然后经过模型计算,得到模型结果,返回给数据需求方。如果部署的是安全的模型,那么数据需求方是不能通过模型的输出反推模型的全部或者部分输入结果的,数据提供方不会泄露原始数据。然而,如果数据需求方对模型进行特殊构造,那么有可能根据模型结果获得部分原始数据。此时对数据提供方而言,该模型就是一个不安全的模型。模型的安全部署是提高平台安全性、增强数据合作双方的互相信任的重要环节。
因此,需要一种方案,能够有效地对数据模型的安全性进行检测,并尽可能降低安全性风险。
发明内容
本说明书一个或多个实施例描述了一种方法和装置,在数据模型部署之前,基于数据模型中模型参数的差异统计信息,检测该数据模型的安全性;进一步地,还可以对数据模型的输出结果进行限制和调整,从而降低数据模型的安全风险。
根据第一方面,提供了一种检测数据模型的安全性的方法,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述方法包括:
获取所述数据模型中包含的多个模型参数;
确定所述多个模型参数的差异统计信息,所述差异统计信息包括与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量;
根据所述差异统计信息,确定所述数据模型的安全性评估信息。
在一种实施方式中,上述方法由数据需求方执行。
在这样的情况下,根据一个实施例,数据需求方将差异统计信息确定为安全性评估信息,并将安全性评估信息提供给所述数据提供方。
根据另一实施例,数据需求方根据差异统计信息,和预定的差异阈值,确定安全性评估信息;并将安全性评估信息提供给所述数据提供方。
在另一种实施方式中,上述方法由数据提供方执行。
在这样的情况下,数据提供方从数据需求方接收所述多个模型参数。
根据一个实施例,数据提供方根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息。进一步地,还可以根据所述安全性评估信息确定是否接受所述数据模型的部署。
根据一种可能的实施方式,通过以下方式确定安全性评估信息:根据针对某个差异统计量预设的多个差异阈值,将该某个差异统计量划分为不同范围,将所述不同范围对应于不同的安全等级作为所述安全性评估信息。
根据另一种可能的实施方式,差异统计信息包括多个统计量,在这样的情况下,通过以下方式确定安全性评估信息:
针对所述多个统计量中的各个统计量与对应差异阈值的比较,确定与各个统计量相关的安全分数;
基于所述与各个统计量相关的安全分数,以及针对各个统计量预设的权重,确定总的安全分数作为安全性评估信息。
在一个实施例中,上述第一统计量包括以下中的至少一项:最大参数与最小参数的比值,最大参数与最小参数的差值相对于最大参数的比例,最大参数与最小参数的差值相对于最小参数的比例,最大参数与参数均值的比例。
在另一实施例中,上述第一统计量包括以下中的至少一项:参数的方差;所述多个模型参数的两两组合中,参数取值比例高于预设比例阈值的组合数目,参数取值之差高于预设差值阈值的组合数目。
在一个实施例中,第二统计量包括以下中的至少一项:各参数小数位数的最大值与最小值的差,各参数的小数部分中连续有效零的个数,各参数的小数部分中连续有效零个数的最大值。
在一个实施例中,数据模型包括,逻辑回归模型,决策树模型,梯度提升决策树GBDT模型,评分卡模型。
根据第二方面,提供一种降低数据模型的安全风险的方法,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述方法包括:
确定所述数据模型的输出结果的结果类型,所述结果类型至少包括连续数值和离散分类概率;
在所述结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值;
在所述结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。
在一个实施例中,所述预定比特位数基于约定的输出结果的范围而预先设定。
根据一个实施例,所述连续数值为小数,采用预定比特位数表示所述连续数值包括,对于所述连续数值保留预定位数的小数,该预定位数基于所述数据模型的模型参数的位数设置而预先设定。
在一个实施例中,通过以下方式将离散分类概率转换为分类决策结果:获取分类决策的分类边界,通过所述离散分类概率与所述分类边界的比较,将所述离散分类概率转换为分类决策结果。
根据第三方面,提供一种检测数据模型的安全性的装置,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述装置包括:
获取单元,配置为获取所述数据模型中包含的多个模型参数;
统计确定单元,配置为确定所述多个模型参数的差异统计信息,所述差异统计信息包括与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量;
安全确定单元,配置为根据所述差异统计信息,确定所述数据模型的安全性评估信息。
根据第四方面,提供一种降低数据模型的安全风险的装置,所述数据模型由数据需 求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述装置包括:
类型确定单元,配置为确定所述数据模型的输出结果的结果类型,所述结果类型至少包括连续数值和离散分类概率;
连续数值处理单元,配置为在所述结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值;
离散结果处理单元,配置为在所述结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。
根据第五方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面和第二方面的方法。
根据第六方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面和第二方面的方法。
通过本说明书实施例提供的方法和装置,在数据模型部署之前,基于数据模型中模型参数的差异统计信息,确定数据模型的安全性评估信息,从而检测该数据模型的安全性。进一步地,在模型预测阶段,还可以对数据模型的输出结果进行限制和调整,降低输出结果的信息量,从而降低数据模型的安全风险。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出本说明书披露的一个实施例的实施场景示意图;
图2示出根据一个实施例的检测数据模型的安全性的方法;
图3A示出在一个实施例中检测方法的执行方式;
图3B示出在另一实施例中检测方法的执行方式;
图3C示出在又一实施例中检测方法的执行方式;
图4示出根据一个实施例的降低数据模型安全风险的方法的流程图;
图5示出根据一个实施例的模型安全性检测装置的示意性框图;
图6示出根据一个实施例的降低安全风险的装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
图1为本说明书披露的一个实施例的实施场景示意图。在该实施场景中,数据需求方与数据提供方进行数据合作,完成数据的处理和分析。具体地,数据提供方具有有待分析的源数据,但是可能没有适用的数据分析工具。数据需求方根据数据分析的需要,构建和训练数据模型,然后把训练好的数据模型部署到数据提供方。在这个意义上,数据需求方又可以称为模型提供方。模型提供方将数据模型部署到数据提供方后,数据模型可以在数据提供方的平台中运行,获取数据提供方的源数据,对源数据进行分析、处理、运算,然后将运算结果返回给数据需求方,即模型提供方。
例如,在一个例子中,数据提供方为银行或金融机构,他们拥有大量的用户信息作为源数据,这些用户信息例如包括用户年龄、收入、地址等用户私密信息。银行或金融机构希望基于这些用户信息,对用户的信用风险进行评估,但是出于对用户的隐私保护,并不能直接把这些数据提供给其他机构。于是,可以选择与数据需求方进行数据合作。数据需求方(即模型提供方)例如是电子金融平台,例如支付宝,蚂蚁财富平台等。这些平台出于业务需要,希望能够获取用户的信用风险数据。因此,作为数据需求方的电子金融平台可以训练好一些信用评估模型,部署到银行或金融机构,对用户信息进行处理和分析,从而获得的用户信用风险评估结果。
为了进一步保证数据的安全,在本说明书提供的一个或多个实施例中,在常规模型训练、模型部署之外,还对数据模型本身的安全性进行检测和评估,并采取一定方式降低数据模型的安全风险。
在一个实施例中,在对数据模型进行部署之前,首先检测数据模型的安全性。安全性的检测可以基于数据模型中模型参数的差异统计来进行。数据提供方可以根据安全性检测的结果,来决定是否接受该数据模型的部署,或者要求模型提供方对模型参数进行修改。另一方面,还可以对数据模型的输出结果进行限制调整,减少输出结果的信息量,进一步降低根据输出结果窃取数据提供方的源数据的风险。下面描述以上构思的具体实现 方式。
图2示出根据一个实施例的检测数据模型的安全性的方法。如前所述,该数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算。如图2所示,该检测方法包括:步骤21,获取数据模型中包含的多个模型参数;步骤23,确定所述多个模型参数的差异统计信息,所述差异统计信息包括与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量;步骤25,根据差异统计信息,确定数据模型的安全性评估信息。图2所示的方法可以由数据提供方执行,或者由数据需求方执行,或者由数据提供方与数据需求方协同执行。
图3A到图3C分别示出在不同实施例中以上的检测方法的执行方式。在图3A所示例的实施例中,数据模型安全性的检测主要由数据提供方执行。在该实施例中,模型提供方/数据需求方将训练好的模型中的模型参数发送给数据提供方。换而言之,数据提供方在执行步骤21时,接收模型提供方所提供的模型参数。然后,在步骤23,确定模型参数的差异统计信息;在步骤25,分析差异统计信息,得出模型安全性评估信息。进一步地,数据提供方可以根据安全性评估信息决定是否接受该数据模型的部署,并向模型提供方返回是否接受部署的消息。
在图3B所示例的实施例中,数据模型安全性的检测主要由数据需求方即模型提供方执行。在该实施例中,模型提供方/数据需求方在步骤21,获取已训练的数据模型的模型参数。然后,在步骤23,确定模型参数的差异统计信息;在步骤25,分析差异统计信息,得出模型安全性评估信息。进一步地,模型提供方可以将模型安全性评估信息发送给数据提供方,使得数据提供方根据该安全性评估信息决定是否接受该数据模型的部署,并向模型提供方返回是否接受部署的消息。
在图3C所示例的实施例中,数据模型安全性的检测由模型提供方和数据提供方协同执行。在该实施例中,模型提供方在步骤21,获取已训练的数据模型的模型参数。然后,在步骤23,确定模型参数的差异统计信息。然后,模型提供方将差异统计信息发送给数据提供方,由数据提供方对差异统计信息进行进一步分析,判断模型的安全性。在该实施例中,也可以认为,模型提供方在步骤25,将差异统计信息确定为初步的安全性评估信息,然后将该初步的安全性评估信息发送给数据提供方。数据提供方进而对初步的安全性评估信息进行进一步分析处理,得出完善的安全性评估信息。从而,数据提供方可以基于完善的安全性评估信息,决定是否接受该数据模型的部署,并向模型提供方返回是否接受部署的消息。
下面描述以上各个步骤的具体执行过程。
首先,在步骤21,获取数据模型中包含的多个模型参数。可以理解,这里的数据模型是模型提供方构建、训练好的数据模型,包括逻辑回归模型、决策树模型,评分卡模型,梯度提升决策树GBDT模型等。模型参数可以是模型计算过程中使用到的各个参数,例如权重系数等。对于较为复杂的神经网络模型,可以选取同一隐藏层对应的模型参数进行分析。
接着,在步骤23,确定所述多个模型参数的差异统计信息。差异统计信息可以包括,与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量。下面描述差异统计信息,例如第一统计量和第二统计量,对模型安全性的影响。
如前所述,在安全的数据模型中,模型提供方不能通过模型的输出结果反向推出模型的全部或部分输入,因而不会泄露源数据。然而,通过对模型参数进行特殊设置,例如将参数的取值大小,或者参数位数进行异常的差异化设置,却有可能从输出结果反推出部分输入数据。下面结合一个简单的例子说明这个过程。
在一个例子中,数据模型为逻辑回归模型,更简单地,采用的是如下线性回归函数:
Y=f(A,X)=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5  (式1)
其中,a1到a5为输入数据,x1到x5为模型参数。
目前许多逻辑回归模型在处理连续变量的源数据时,为了提高后续计算效率,会首先对变量进行分箱,然后进行独热(one-hot)编码转换,这样处理的结果是,与模型参数直接运算的变量取值都是0或1。也就是说,以上a1到a5是与源数据对应的经处理的输入数据,取值为0或1。并且,源数据处理的过程也是由数据模型进行,因此模型提供方可以知晓这些输入变量的含义。例如,在一个例子中,输入变量数据a1是对连续变量“用户年龄”进行分箱、编码的结果,表示年龄是否大于30岁,当取0时,表示小于30岁,取1时表示大于30岁。类似地,输入变量数据a2和a3可以是对连续变量“用户收入”进行分箱、编码的结果,其中a2表示收入是否超过1万元,a3表示收入是否超过3万元,等等。因此,当a2和a3均取0,表示用户收入不足1万元;当a2取1,a3取0,表示用户收入在1万元到3万元之间;当a2和a3均取1,表示用户收入高于3万元。
对于以上的公式1,通过对模型参数x1到x5进行异常的差异化设置,有可能从输出结果反推出部分输入数据。
一方面,差异化设置可以体现为,参数取值大小的差异化。
在一个例子中,如果将某个参数的取值设置为远远大于其他参数,那么这样的差异化设置有可能为反推源数据提供线索。
例如,在一个具体例子中,x1,x2,...x5的取值分别是0.9,0.12,0.153,0.03,0.09,其中x1的取值被设置为远远大于其他参数。那么x1是一个非常敏感的字段。在公式(1)中输入数据a1到a5均取值为0或1的情况下,通过最后结果的大小,至少可以判断与x1对应的输入参数a1的取值。如果结果Y大于0.9,说明a1取值是1,否则是0,(因为即使a2到a5全部取1,x2到x5的和也远远不足0.9)。由此,通过输出结果,反推出输入变量a1的取值,进而获取到原始用户信息,例如a1所表示的用户是否大于30岁。
针对这样的情况,可以获取以下统计量中的一项或多项作为第一统计量:多个模型参数中,最大参数与最小参数的比值、最大参数与最小参数的差值相对于最大参数的比例、最大参数与最小参数的差值相对于最小参数的比例、最大参数与参数均值的比例等。这些统计量都可以反映,是否存在取值异常的参数,特别是取值远大于其他参数的异常参数,从而为模型的安全性评估提供参考依据。
在一个例子中,进一步地,如果将多个参数的取值设置为互相之间差距过大,那么这样的差异化设置也可以用于反推源数据。
例如,在一个具体例子中,x1,x2,...x5的取值分别是0.9,0.12,0.303,0.03,0.034。可以看到,这个例子中,5个参数中,x1的取值接近1,x2和x3为同一量级,但是有3倍的差距,而x4和x5则比x1到x3小一个量级。通过这样差距较大的参数设置,有可能通过结果反推输入变量的值。例如,可以推出如下结果:
如果:0.4<Y<0.9,那么:a1=0,a2=1,a3=1;
如果:0.9<Y<1.0,那么:a1=1,a2=0,a3=0;
如果:1.0<Y<1.3,那么:a1=1,a2=1,a3=0;
如果:Y>1.3,那么:a1=1,a2=1,a3=1。
由此,可以通过输出结果Y的范围,反推出输入变量a1,a2和a3的取值,进而获取到原始用户信息,例如通过a1取值推断用户是否大于30岁,根据a2和a3的取值推断用户收入的范围。
针对这样的情况,可以获取以下统计量中的一项或多项作为第一统计量:参数的方差;多个模型参数的两两组合中,参数取值比例高于预设比例阈值的组合数目,参数取 值之差高于预设差值阈值的组合数目,等等。例如,对于以上的x1到x5,可以形成10种两两参数组合,如果预设比例阈值为10,那么参数取值比例高于预设比例阈值(10)的组合数目为3,即x1x4,x1x5,x3x4这3个组合。此外还可以计算参数取值之差过大的组合数目等统计量。这些统计量旨在反映,多个参数的取值互相之间是否存在差距过大的情况,从而为模型的安全性评估提供参考依据。
另一方面,参数的差异化设置还可以体现为,参数位数的差异化。
在一个例子中,如果将某些参数的小数有效位数进行特殊设置,例如位数远超其他参数,或者位数差异较大,那么小数位数的设置也可以起到特殊标记的作用,有可能为反推源数据提供线索。
例如,在一个具体例子中,x1,x2,...x5的取值分别是0.310000,0.101000,0.800100,0.300010,0.500001。可以看到,这5个参数的小数有效位数(即不含末尾的0)分别为2位,3位,4位,5位和6位。如此,通过结果的小数有效位数,可以反推出至少部分输入变量的值。例如,如果结果Y的小数有效位数为5位,那么至少可以推断,x4参与了运算,a4取值为1。
更进一步地,在以上例子中,实际上各个参数通过中间的零和末尾的1进行了特殊的标记。各个参数的小数点后第一位为取值位,从小数点后两位开始实际上作用为标记位,分别将小数点后第2位,第3位,第4位到第6位标记为1,其他位数填充0。如此,可以从输出结果Y的小数点后两位开始的部分推断输入变量的取值,这部分中哪一位是1,对应的输入变量取值即为1。例如,如果输出结果Y的小数部分为.801001,那么可以推断,x2和x5参与了运算,相应地,a2和a5取值为1,其他变量取值为0。
针对这样的情况,可以获取以下统计量中的一项或多项作为第二统计量:各参数小数位数的最大值与最小值的差,各参数的小数部分中连续有效零的个数,各参数的小数部分中连续有效零个数的最大值,等等。这些统计量都可以反映,是否存在小数位数异常的参数,例如小数位数的最大值与最小值的差可以反映小数位数长度的异常,小数部分中连续有效零的个数(也就是中间包含的连续0的个数)可以反映该小数是否可能用作标记位,等等。因此,统计与参数位数差异相关的第二统计量也可以作为模型的安全性评估的依据。
尽管在以上的例子中,是以输入变量经过分箱、编码后取值为0或1的离散值的例子进行说明,但是这样的构思也同样适用于输入变量为连续变量的情况。
例如,仍然以以上的公式(1)为例,假定输入变量a1表示用户收入,是取值范围在0到100000之间的连续变量,一般地,a1的取值在2000到50000之间。假定这个变量是模型提供方最为关注的变量,那么可以将对应的模型参数x1设置为远远大于其他参数,例如x1=0.99,x2到x5都是0.01左右的大小。那么最终得到的结果Y,实际上约等于a1的大小,至少可以反映a1的大致范围。如此,仍然可以通过模型参数的取值大小差异设置,获知部分源数据的值或范围。因此,对于这样的情况,同样可以采用以上的差异统计信息来衡量模型安全性风险。
此外,尽管以上列出了若干种具体的统计量,但是本领域技术人员在阅读本说明的情况下,有可能将其扩展到更多的统计量(例如将方差扩展到均方根,将参数小数位数的最大值与最小值的差扩展到小数位数的最大值与最小值的差与最大值的比例,等等),只要这些统计量是与模型参数的取值大小差异和/或位数差异有关,都可以从一定程度一定角度反映模型安全性风险。
在如上所述获取了模型参数的差异统计信息的基础上,接着在步骤25,根据差异统计信息,确定数据模型的安全性评估信息。
在一个实施例中,差异统计信息可以直接作为简单的安全性评估信息。例如,在一个具体例子中,差异统计信息包括第一统计量中的最大参数与最小参数的比值,该比值就可以作为安全性评估信息。比值越大,安全性越低,比值越小,安全性越高。
在另一实施例中,根据差异统计信息,和预定的差异阈值,确定安全性评估信息。
在一个例子中,可以针对不同的差异统计量,设置不同的差异阈值,例如针对取值大小比例的统计量,设置比例阈值;针对位数差值的统计量,设置差值阈值等。
针对同一差异统计量,可以设置多个差异阈值,从而将差异统计量划分为不同范围,这些不同范围对应于不同的安全等级。例如,对于统计量S1:参数最大值与最小值的比值,可以设置第一阈值10和第二阈值100,当S1低于第一阈值10时,安全等级为高安全性;S1大于第一阈值10小于第二阈值100时,安全等级为中等安全性;S1大于第二阈值100时,安全等级为低安全性。
在差异统计信息包括多个统计量的情况下,还可以为每个统计量赋予一定的权重;在确定安全评估信息时,可以首先针对各个统计量与对应差异阈值的比较,确定与该统计量相关的安全分数,然后基于各个统计量的权重,确定总的安全分数作为安全性评估信息。
例如,在一个具体例子中,差异统计信息至少包括S1,S2和S3,其中统计量S1为参数最大值与最小值的比值,与S1相关的安全分数Q1的计算例如为,比值低于第一阈值10,安全分数为10;大于第一阈值10小于第二阈值100,安全分数为5;大于第二阈值100,安全分数为1。统计量S2为参数取值比例高于预设比例阈值的组合数目,可以基于S2确定对应的安全分数Q2(具体过程可以根据需要设定,不再详细举例)。统计量S3为各参数小数位数的最大值与最小值的差,可以基于S3确定对应的安全分数Q3。假定分别为这三个统计量赋予的权重为0.5,0.3,0.2,那么可以得到与模型参数对应的总安全分数为:Q=0.5Q1+0.3Q2+0.2Q3。这样的总安全分数可以确定为安全性评估信息。
如此,通过多种方式,基于差异统计信息,确定出安全性评估信息。这样的安全评估信息可以用于数据提供方来评估数据模型的安全性,进而决定是否要接受该数据模型的部署,或者是否要求模型提供方修改模型。如此,在模型部署之前,通过对模型安全性的检测,对数据模型的安全性进行评估,提高数据合作中模型计算的安全性。
另一方面,还提供一种降低数据模型的安全风险的方法。图4示出根据一个实施例的降低数据模型安全风险的方法的流程图,其中的数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算。如图4所示,所述方法包括:步骤41,确定数据模型的输出结果的结果类型,所述结果类型至少包括连续数值和离散分类概率;步骤43,在结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值;步骤45,在结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。
在一个实施例中,图4的方法可以由数据提供方执行。也就是,在数据提供方接受数据模型的部署之后,数据提供方为了进一步降低安全风险,可以添加一个计算组件来执行图4的方法。通过该方法,截获数据模型的输出结果,对该输出结果进行限制和调整,然后将经过限制和调整的输出结果返回给模型提供方。
在一个实施例中,图4的方法可以由模型提供方执行。也就是,模型提供方可以应数据提供方的要求,为了进一步降低安全风险,在原数据模型的基础上添加一个计算组件来执行图4的方法。该计算组件可以附加到原数据模型之上,作为优化的数据模型的一部分,与原数据模型一起部署到数据提供方。通过该方法,模型提供方只获取到经过限制和调整的输出结果,从而降低数据提供方的安全风险。
下面描述图4流程中各个步骤的执行方式。
首先,在步骤41,确定数据模型的输出结果的结果类型。一般地,对于多数数据模型来说,结果类型可以包括,连续数值结果,和离散结果。连续数值结果例如是,利用逻辑回归模型或评分卡模型基于用户行为数据对用户的信用值进行的打分,例如公式1中的输出结果Y可以是连续数值结果。离散结果包括例如分类决策结果,例如对于输入图片,采用决策树模型将其分类为一类图片,即包含目标对象的图片,或者二类图片,即不包含目标对象的图片。离散结果还可以包括离散分类概率,例如将某个图片分类为一类图片的概率,和分类为二类图片的概率。对于不同的结果类型,下面进行不同的处理方式。
在一个实施例中,在步骤43,在结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值,其目的为,用尽量少的比特位数来表示输出结果的数值,从而避免通过冗余比特位提供附加标记而窃取源数据信息。
在一个例子中,该预定比特位数可以基于约定的输出结果的范围来预先设定。例如,模型提供方可以与数据提供方约定,模型的输出结果为0-100之间的打分。那么,在步骤43,可以采用6个比特来表示该输出结果,因为6个比特位足以表示最大值128的输出值。而如果采用常规的浮点数定义(64比特位),则会存在一些冗余位,这些冗余位有可能被利用来进行特殊标记,造成安全风险。
在一个例子中,输出结果为小数,在这样的情况下,采用预定比特位数表示输出结果包括,只保留预定位数的小数。该预定位数可以基于模型参数的位数设置而预先设定。例如,在前述的一个例子中,x1,x2,...x5的取值分别是0.310000,0.101000,0.800100,0.300010,0.500001,从小数点后两位开始实际上作用为标记位。此时,可以将输出结果设定为,只保留2位小数,从而在保留真实取值位的情况下,避免标记位对源数据的标记作用。在一个实施例中,在将输出结果的小数进行截断之后,将小数的结果整数化,从而仍然采用预定比特位数表示输出结果。
另一方面,在步骤45,在结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。为此,在一个实施例中,获取分类决策的分类边界。分类边界可以预先由模型设定,也可以在该步骤指定。通过分类概率与分类边界的比较,可以将离散分类概率转换为分类决策结果。
例如,在一个例子中,离散分类概率包括,属于一类图片的概率为65%,属于二类图片的概率为35%,分类边界为50%,那么可以将离散分类概率直接转换为分类决策结果:一类图片。
通过这样的方式,尽量地减少返回到模型提供方的输出结果的信息量,增加反推源数据的难度,从而降低数据模型的安全性风险。
根据另一方面的实施例,还提供一种检测数据模型的安全性的装置。图5示出根据一个实施例的模型安全性检测装置的示意性框图,该装置用于检测数据模型的安全性,该数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算。如图5所示,检测装置500包括:获取单元51,配置为获取所述数据模型中包含的多个模型参数;统计确定单元53,配置为确定所述多个模型参数的差异统计信息,所述差异统计信息包括与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量;安全确定单元55,配置为根据所述差异统计信息,确定所述数据模型的安全性评估信息。
在第一实施例中,装置500设置在数据需求方。
在这样的情况下,在一个例子中,所述安全确定单元55可以配置为:将所述差异统计信息确定为所述安全性评估信息。进一步地,装置500还可以包括提供单元(未示出),配置为将所述安全性评估信息提供给所述数据提供方。
在另一例子中,所述安全确定单元55还可以配置为:根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息。提供单元配置为将这样的安全性评估信息提供给所述数据提供方。
在第二实施例中,装置500设置在数据提供方。
在这样的情况下,获取单元51配置为,从数据需求方接收所述多个模型参数。
在一个例子中,安全确定单元55配置为:根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息。
进一步地,装置500还包括部署确定单元(未示出),配置为根据所述安全性评估信息确定是否接受所述数据模型的部署。
根据一种实施方式,不管装置500设置在哪一方,安全确定单元55都可以配置为:根据针对某个差异统计量预设的多个差异阈值,将该某个差异统计量划分为不同范围,将所述不同范围对应于不同的安全等级作为所述安全性评估信息。
根据一种实施方式,差异统计信息包括多个统计量,此时安全确定单元55可以配置为:针对所述多个统计量中的各个统计量与对应差异阈值的比较,确定与各个统计 量相关的安全分数;基于所述与各个统计量相关的安全分数,以及针对各个统计量预设的权重,确定总的安全分数作为安全性评估信息。
在一个实施例中,第一统计量包括以下中的至少一项:最大参数与最小参数的比值,最大参数与最小参数的差值相对于最大参数的比例,最大参数与最小参数的差值相对于最小参数的比例,最大参数与参数均值的比例。
在另一实施例中,第一统计量包括以下中的至少一项:参数的方差;所述多个模型参数的两两组合中,参数取值比例高于预设比例阈值的组合数目,参数取值之差高于预设差值阈值的组合数目。
在一个实施例中,第二统计量包括以下中的至少一项:各参数小数位数的最大值与最小值的差,各参数的小数部分中连续有效零的个数,各参数的小数部分中连续有效零个数的最大值。
根据一种实施方式,所述数据模型包括,逻辑回归模型,决策树模型,梯度提升决策树GBDT模型,评分卡模型。
根据又一方面的实施例,还提供一种降低数据模型的安全风险的装置。图6示出根据一个实施例的降低安全风险的装置,该装置用于降低数据模型的安全风险,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算。如图6所示,用于降低安全风险的装置600包括:类型确定单元61,配置为确定所述数据模型的输出结果的结果类型,所述结果类型至少包括连续数值和离散分类概率;连续数值处理单元63,配置为在所述结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值;离散结果处理单元65,配置为在所述结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。
在一个实施例中,上述预定比特位数基于约定的输出结果的范围而预先设定。
根据一个实施例,在输出的连续数值为小数的情况下,连续数值处理单元63配置为,对于所述连续数值保留预定位数的小数,该预定位数基于所述数据模型的模型参数的位数设置而预先设定。
根据一个实施例,离散结果处理单元65配置为,获取分类决策的分类边界,通过所述离散分类概率与所述分类边界的比较,将所述离散分类概率转换为分类决策结果。
如此,通过以上实施例,在模型部署之前,基于差异统计信息,确定出安全性评估信息。这样的安全评估信息可以用于数据提供方来评估数据模型的安全性,进而决定 是否要接受该数据模型的部署,或者是否要求模型提供方修改模型。如此,在模型部署之前,通过对模型安全性的检测,对数据模型的安全性进行评估,提高数据合作中模型计算的安全性。
进一步地,在模型运行预测时,通过对输出结果进行限制和调整,尽量地减少返回到模型提供方的输出结果的信息量,增加反推源数据的难度,从而降低数据模型的安全性风险。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2和图4所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2和图4所述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (34)

  1. 一种检测数据模型的安全性的方法,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述方法包括:
    获取所述数据模型中包含的多个模型参数;
    确定所述多个模型参数的差异统计信息,所述差异统计信息包括与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量;
    根据所述差异统计信息,确定所述数据模型的安全性评估信息。
  2. 根据权利要求1所述的方法,其中,所述方法由所述数据需求方执行,所述确定所述数据模型的安全性评估信息包括:
    将所述差异统计信息确定为所述安全性评估信息;
    所述方法还包括,将所述安全性评估信息提供给所述数据提供方。
  3. 根据权利要求1所述的方法,其中所述方法由所述数据需求方执行,所述确定所述数据模型的安全性评估信息包括:
    根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息;
    所述方法还包括,将所述安全性评估信息提供给所述数据提供方。
  4. 根据权利要求1所述的方法,其中所述方法由所述数据提供方执行;
    所述获取所述数据模型中包含的多个模型参数包括,从所述数据需求方接收所述多个模型参数。
  5. 根据权利要求4所述的方法,其中所述确定所述数据模型的安全性评估信息包括:根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息。
  6. 根据权利要求4所述的方法,还包括,根据所述安全性评估信息确定是否接受所述数据模型的部署。
  7. 根据权利要求3或5所述的方法,其中根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息包括:根据针对所述差异统计信息中某个差异统计量预设的多个差异阈值,将该某个差异统计量划分为不同范围,将所述不同范围对应于不同的安全等级作为所述安全性评估信息。
  8. 根据权利要求3或5所述的方法,其中所述差异统计信息包括多个统计量,所述根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息包括:
    针对所述多个统计量中的各个统计量与对应差异阈值的比较,确定与各个统计量相关的安全分数;
    基于所述与各个统计量相关的安全分数,以及针对各个统计量预设的权重,确定总 的安全分数作为安全性评估信息。
  9. 根据权利要求1所述的方法,其中所述第一统计量包括以下中的至少一项:最大参数与最小参数的比值,最大参数与最小参数的差值相对于最大参数的比例,最大参数与最小参数的差值相对于最小参数的比例,最大参数与参数均值的比例。
  10. 根据权利要求1所述的方法,其中所述第一统计量包括以下中的至少一项:参数的方差;所述多个模型参数的两两组合中,参数取值比例高于预设比例阈值的组合数目,参数取值之差高于预设差值阈值的组合数目。
  11. 根据权利要求1所述的方法,其中所述第二统计量包括以下中的至少一项:各参数小数位数的最大值与最小值的差,各参数的小数部分中连续有效零的个数,各参数的小数部分中连续有效零个数的最大值。
  12. 根据权利要求1所述的方法,其中所述数据模型包括,逻辑回归模型,决策树模型,梯度提升决策树GBDT模型,评分卡模型。
  13. 一种降低数据模型的安全风险的方法,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述方法包括:
    确定所述数据模型的输出结果的结果类型,所述结果类型至少包括连续数值和离散分类概率;
    在所述结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值;
    在所述结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。
  14. 根据权利要求13的方法,其中所述预定比特位数基于约定的输出结果的范围而预先设定。
  15. 根据权利要求13的方法,其中所述连续数值为小数,采用预定比特位数表示所述连续数值包括,对于所述连续数值保留预定位数的小数,该预定位数基于所述数据模型的模型参数的位数设置而预先设定。
  16. 根据权利要求13的方法,其中将所述离散分类概率转换为分类决策结果包括,获取分类决策的分类边界,通过所述离散分类概率与所述分类边界的比较,将所述离散分类概率转换为分类决策结果。
  17. 一种检测数据模型的安全性的装置,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述装置包括:
    获取单元,配置为获取所述数据模型中包含的多个模型参数;
    统计确定单元,配置为确定所述多个模型参数的差异统计信息,所述差异统计信息 包括与参数取值大小差异相关的第一统计量,和/或与参数位数差异相关的第二统计量;
    安全确定单元,配置为根据所述差异统计信息,确定所述数据模型的安全性评估信息。
  18. 根据权利要求17所述的装置,其中,所述装置设置在所述数据需求方,所述安全确定单元配置为:
    将所述差异统计信息确定为所述安全性评估信息;
    所述装置还包括提供单元,配置为将所述安全性评估信息提供给所述数据提供方。
  19. 根据权利要求17所述的装置,其中所述装置设置在所述数据需求方,所述安全确定单元配置为:
    根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息;
    所述方法装置还包括提供单元,配置为将所述安全性评估信息提供给所述数据提供方。
  20. 根据权利要求17所述的装置,其中所述装置设置在所述数据提供方;
    所述获取单元配置为,从所述数据需求方接收所述多个模型参数。
  21. 根据权利要求20所述的装置,其中所述安全确定单元配置为:根据所述差异统计信息,和预定的差异阈值,确定安全性评估信息。
  22. 根据权利要求20所述的装置,还包括部署确定单元,配置为根据所述安全性评估信息确定是否接受所述数据模型的部署。
  23. 根据权利要求19或21所述的装置,其中所述安全确定单元配置为:根据针对某个差异统计量预设的多个差异阈值,将该某个差异统计量划分为不同范围,将所述不同范围对应于不同的安全等级作为所述安全性评估信息。
  24. 根据权利要求19或21所述的装置,其中所述差异统计信息包括多个统计量,所述安全确定单元配置为:
    针对所述多个统计量中的各个统计量与对应差异阈值的比较,确定与各个统计量相关的安全分数;
    基于所述与各个统计量相关的安全分数,以及针对各个统计量预设的权重,确定总的安全分数作为安全性评估信息。
  25. 根据权利要求17所述的装置,其中所述第一统计量包括以下中的至少一项:最大参数与最小参数的比值,最大参数与最小参数的差值相对于最大参数的比例,最大参数与最小参数的差值相对于最小参数的比例,最大参数与参数均值的比例。
  26. 根据权利要求17所述的装置,其中所述第一统计量包括以下中的至少一项:参 数的方差;所述多个模型参数的两两组合中,参数取值比例高于预设比例阈值的组合数目,参数取值之差高于预设差值阈值的组合数目。
  27. 根据权利要求17所述的装置,其中所述第二统计量包括以下中的至少一项:各参数小数位数的最大值与最小值的差,各参数的小数部分中连续有效零的个数,各参数的小数部分中连续有效零个数的最大值。
  28. 根据权利要求17所述的装置,其中所述数据模型包括,逻辑回归模型,决策树模型,梯度提升决策树GBDT模型,评分卡模型。
  29. 一种降低数据模型的安全风险的装置,所述数据模型由数据需求方提供以部署到数据提供方,用于对数据提供方的源数据进行模型运算;所述装置包括:
    类型确定单元,配置为确定所述数据模型的输出结果的结果类型,所述结果类型至少包括连续数值和离散分类概率;
    连续数值处理单元,配置为在所述结果类型为连续数值的情况下,采用预定比特位数表示所述连续数值;
    离散结果处理单元,配置为在所述结果类型为离散分类概率的情况下,将所述离散分类概率转换为分类决策结果。
  30. 根据权利要求29的装置,其中所述预定比特位数基于约定的输出结果的范围而预先设定。
  31. 根据权利要求29的装置,其中所述连续数值为小数,所述连续数值处理单元配置为,对于所述连续数值保留预定位数的小数,该预定位数基于所述数据模型的模型参数的位数设置而预先设定。
  32. 根据权利要求29的装置,其中所述离散结果处理单元配置为,获取分类决策的分类边界,通过所述离散分类概率与所述分类边界的比较,将所述离散分类概率转换为分类决策结果。
  33. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-16中任一项的所述的方法。
  34. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-16中任一项所述的方法。
PCT/CN2019/090963 2018-07-17 2019-06-12 检测数据模型安全性的方法及装置 WO2020015480A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810785405.9A CN110728290B (zh) 2018-07-17 2018-07-17 检测数据模型安全性的方法及装置
CN201810785405.9 2018-07-17

Publications (1)

Publication Number Publication Date
WO2020015480A1 true WO2020015480A1 (zh) 2020-01-23

Family

ID=69164974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090963 WO2020015480A1 (zh) 2018-07-17 2019-06-12 检测数据模型安全性的方法及装置

Country Status (3)

Country Link
CN (1) CN110728290B (zh)
TW (1) TWI712917B (zh)
WO (1) WO2020015480A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085369A (zh) * 2020-09-02 2020-12-15 支付宝(杭州)信息技术有限公司 规则模型的安全性检测方法、装置、设备及系统
CN112085370A (zh) * 2020-09-02 2020-12-15 支付宝(杭州)信息技术有限公司 规则模型的安全性检测方法、装置、设备及系统
CN112116028A (zh) * 2020-09-29 2020-12-22 联想(北京)有限公司 模型决策解释实现方法、装置及计算机设备
CN112560085A (zh) * 2020-12-10 2021-03-26 支付宝(杭州)信息技术有限公司 业务预测模型的隐私保护方法及装置
CN115598455A (zh) * 2022-11-15 2023-01-13 西安弘捷电子技术有限公司(Cn) 一种电子信息装备自动测试系统及测试方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489828A (zh) * 2020-04-27 2020-08-04 无锡市中医医院 一种筛选骨折后形成血栓因素的方法
CN112085590B (zh) * 2020-09-02 2023-03-14 支付宝(杭州)信息技术有限公司 规则模型的安全性的确定方法、装置和服务器
CN112085588B (zh) * 2020-09-02 2022-11-29 支付宝(杭州)信息技术有限公司 规则模型的安全性的确定方法、装置和数据处理方法
CN112085589B (zh) * 2020-09-02 2022-11-22 支付宝(杭州)信息技术有限公司 规则模型的安全性的确定方法、装置和服务器

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052595A (zh) * 2014-05-23 2014-09-17 戴葵 密码算法定制方法
CN106372240A (zh) * 2016-09-14 2017-02-01 北京搜狐新动力信息技术有限公司 一种数据分析的方法和装置
CN107292174A (zh) * 2016-03-31 2017-10-24 中国电子科技集团公司电子科学研究院 一种云计算系统安全性评估方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102457560B (zh) * 2010-10-29 2016-03-30 中兴通讯股份有限公司 一种云计算的安全管理方法和系统
CN102436489B (zh) * 2011-11-03 2013-08-21 北京数码大方科技股份有限公司 三维模型数据的处理方法、装置及系统
US9563771B2 (en) * 2014-01-22 2017-02-07 Object Security LTD Automated and adaptive model-driven security system and method for operating the same
US9444829B1 (en) * 2014-07-30 2016-09-13 Symantec Corporation Systems and methods for protecting computing resources based on logical data models
CN105808366B (zh) * 2016-03-14 2018-12-14 南京航空航天大学 一种基于四变量模型的系统安全分析方法
CN105808368B (zh) * 2016-03-15 2019-04-30 南京联成科技发展股份有限公司 一种基于随机概率分布的信息安全异常检测的方法及系统
CN106157132A (zh) * 2016-06-20 2016-11-23 中国工商银行股份有限公司 信用风险监控系统及方法
CN108076018A (zh) * 2016-11-16 2018-05-25 阿里巴巴集团控股有限公司 身份认证系统、方法、装置及账号认证方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052595A (zh) * 2014-05-23 2014-09-17 戴葵 密码算法定制方法
CN107292174A (zh) * 2016-03-31 2017-10-24 中国电子科技集团公司电子科学研究院 一种云计算系统安全性评估方法及装置
CN106372240A (zh) * 2016-09-14 2017-02-01 北京搜狐新动力信息技术有限公司 一种数据分析的方法和装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085369A (zh) * 2020-09-02 2020-12-15 支付宝(杭州)信息技术有限公司 规则模型的安全性检测方法、装置、设备及系统
CN112085370A (zh) * 2020-09-02 2020-12-15 支付宝(杭州)信息技术有限公司 规则模型的安全性检测方法、装置、设备及系统
CN112085369B (zh) * 2020-09-02 2024-04-23 支付宝(杭州)信息技术有限公司 规则模型的安全性检测方法、装置、设备及系统
CN112085370B (zh) * 2020-09-02 2024-04-23 支付宝(杭州)信息技术有限公司 规则模型的安全性检测方法、装置、设备及系统
CN112116028A (zh) * 2020-09-29 2020-12-22 联想(北京)有限公司 模型决策解释实现方法、装置及计算机设备
CN112116028B (zh) * 2020-09-29 2024-04-26 联想(北京)有限公司 模型决策解释实现方法、装置及计算机设备
CN112560085A (zh) * 2020-12-10 2021-03-26 支付宝(杭州)信息技术有限公司 业务预测模型的隐私保护方法及装置
CN112560085B (zh) * 2020-12-10 2023-09-19 支付宝(杭州)信息技术有限公司 业务预测模型的隐私保护方法及装置
CN115598455A (zh) * 2022-11-15 2023-01-13 西安弘捷电子技术有限公司(Cn) 一种电子信息装备自动测试系统及测试方法
CN115598455B (zh) * 2022-11-15 2023-04-07 西安弘捷电子技术有限公司 一种电子信息装备自动测试系统及测试方法

Also Published As

Publication number Publication date
TW202006590A (zh) 2020-02-01
CN110728290B (zh) 2020-07-31
TWI712917B (zh) 2020-12-11
CN110728290A (zh) 2020-01-24

Similar Documents

Publication Publication Date Title
WO2020015480A1 (zh) 检测数据模型安全性的方法及装置
US11636380B2 (en) Method for protecting a machine learning model against extraction using an ensemble of a plurality of machine learning models
KR102061987B1 (ko) 위험 평가 방법 및 시스템
US20200293924A1 (en) Gbdt model feature interpretation method and apparatus
TW201933242A (zh) 訓練詐欺交易檢測模型的方法、檢測方法以及對應裝置
AU2016243106A1 (en) Optimizing neural networks for risk assessment
WO2022199185A1 (zh) 用户操作检测方法及程序产品
US20220277174A1 (en) Evaluation method, non-transitory computer-readable storage medium, and information processing device
CN112927072A (zh) 一种基于区块链的反洗钱仲裁方法、系统及相关装置
CN111062486A (zh) 一种评价数据的特征分布和置信度的方法及装置
CN109583731A (zh) 一种风险识别方法、装置及设备
Zhang et al. Proa: A probabilistic robustness assessment against functional perturbations
KR101368103B1 (ko) 리스크 관리 디바이스
CN111222583A (zh) 一种基于对抗训练与关键路径提取的图像隐写分析方法
CN111476668B (zh) 可信关系的识别方法、装置、存储介质和计算机设备
CN114548300B (zh) 解释业务处理模型的业务处理结果的方法和装置
CN112884480A (zh) 异常交易识别模型的构造方法、装置、计算机设备和介质
CN112766320B (zh) 一种分类模型训练方法及计算机设备
CN112749978B (zh) 检测方法、装置、设备、存储介质以及程序产品
JP2014206382A (ja) 目標類識別装置
CN111209567B (zh) 提高检测模型鲁棒性的可知性判断方法及装置
CN113222480A (zh) 对抗样本生成模型的训练方法及装置
WO2023062750A1 (ja) データ生成方法、データ生成プログラム及び情報処理装置
US11671258B1 (en) Apparatus and method for contingent assignment actions
US20230127927A1 (en) Systems and methods for protecting trainable model validation datasets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19836984

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19836984

Country of ref document: EP

Kind code of ref document: A1