WO2021143478A1

WO2021143478A1 - Method and apparatus for identifying adversarial sample to protect model security

Info

Publication number: WO2021143478A1
Application number: PCT/CN2020/138824
Authority: WO
Inventors: 石磊磊; 熊涛
Original assignee: 上海风报信息科技有限公司
Priority date: 2020-01-15
Filing date: 2020-12-24
Publication date: 2021-07-22
Also published as: CN110852450A; CN110852450B

Abstract

A method for identifying an adversarial sample to protect privacy security. The method comprises: first sampling multiple non-adversarial samples relating to private data, so as to obtain a first control sample set; then adding a target sample to be tested to the first control sample set, so as to obtain a first experimental sample set; next, separately using the first control sample set and the first experimental sample set to train an initial machine learning model, so as to obtain a trained first control model and a trained first experimental model; then, using a test sample set to perform performance evaluation on the first control model and the first experimental model, separately, so as to obtain a first control value and a first experimental value for a preset evaluation index; and next, calculating the difference value between the first control value and the first experimental value as a first gain value of the target sample for the model performance. Therefore, whether the target sample is an adversarial sample can be determined on the basis of the first gain value or multiple gain values obtained by repeating the process above.

Description

Method and device for identifying adversarial samples to protect model safety

Technical field

One or more embodiments of this specification relate to the technical field of data computing security, and more particularly to a method and device for identifying countermeasure samples to protect model security.

Background technique

Adversarial samples are input samples that are formed by deliberately adding subtle interference to the data set, which causes the machine learning model to output wrong results with high confidence. For example, in an image recognition scene, a picture that was originally recognized as a panda by the image processing model is misclassified as a gibbon after adding a slight modification that is even imperceptible to the human eye.

Adversarial samples can be used by attackers to attack machine learning models. For example, in the process of model training, the adversarial sample includes wrong labels, which leads to a decrease in the performance of model training, and the accuracy of the prediction result of the model obtained after the training is low.

Therefore, there is an urgent need for a reasonable and reliable solution that can accurately identify adversarial samples to protect the safety of the model, thereby improving the training performance and prediction performance of the model.

Summary of the invention

One or more embodiments of this specification describe a method and device for identifying adversarial samples to protect the safety of the model, which can be used to improve the training performance and prediction performance of the model.

According to the first aspect, a method for identifying adversarial samples to protect model safety is provided. The method includes: sampling multiple non-adversarial samples several times to obtain several control sample sets; To obtain a number of experimental sample sets; for any first control sample set in the plurality of control sample sets, use the first control sample set to train the initial machine learning model to obtain the trained first control model; use the test The sample set performs performance evaluation on the first control model to obtain a first control value for a preset evaluation index, the test sample set is determined based on the plurality of non-confrontational samples; A first experimental sample set obtained by adding the target sample, using the first experimental sample set to train the initial machine learning model to obtain a trained first experimental model; using the test sample set to compare the first experimental model The performance evaluation of the experimental model is performed to obtain the first experimental value for the preset evaluation index; the difference between the first experimental value and the first control value is determined as the first gain value; By comparing the sample set and the several gain values determined by the several experimental sample sets, it is determined whether the target sample belongs to the adversarial sample.

In an embodiment, the plurality of non-confrontational samples and target samples are image samples, and the initial machine learning model is an image processing model; or, the plurality of non-confrontational samples and target samples are text samples, and the initial machine learning model is an image processing model. The machine learning model is a text processing model; or, the plurality of non-confrontational samples and target samples are speech samples, and the initial machine learning model is a speech processing model.

In one embodiment, sampling multiple non-confrontational samples several times to obtain several control sample sets includes: using an enumeration method to sample the multiple non-confrontational samples multiple times to obtain multiple control sample sets; or , Using the stratified sampling method to sample the multiple non-confrontational samples several times to obtain the several control sample sets; or using the self-service sampling method to perform several samplings of the multiple non-confrontational samples to obtain the Describe several control sample sets.

In an embodiment, the preset evaluation index includes one or more of the following: error rate, accuracy, recall rate, and precision rate.

In one embodiment, using several gain values determined based on the several control sample sets and the several experimental sample sets to determine whether the target sample is an adversarial sample includes: determining the gain average of the several gain values, In addition, in the case where the gain average value is less than the set threshold value, it is determined that the target sample belongs to the adversarial sample; or, the gain ratio of the plurality of gain values that is greater than the set threshold value is determined, and the gain ratio is less than the first In the case of a preset ratio, it is determined that the target sample belongs to the adversarial sample.

In a specific embodiment, determining whether the target sample is an adversarial sample further includes: averaging a number of comparison values of the plurality of comparison sample sets against the preset evaluation index to obtain a comparison mean value; The product of the mean value and the second preset ratio is determined as the set threshold.

According to a second aspect, there is provided an apparatus for identifying adversarial samples to protect the safety of a model. The apparatus includes: a sampling unit configured to sample multiple non-adversarial samples several times to obtain several control sample sets; and an adding unit configured to The plurality of control sample sets are respectively added to the target samples to be tested to obtain a plurality of experimental sample sets; the first training unit is configured to use the first control sample set for any first control sample set in the plurality of control sample sets The initial machine learning model is trained to obtain the trained first comparison model; the first evaluation unit is configured to use the test sample set to evaluate the performance of the first comparison model to obtain the first comparison value for the preset evaluation index, so The test sample set is determined based on the multiple non-confrontational samples; the second training unit is configured to use the first experimental sample set obtained by adding the target sample to the first control sample set. The experimental sample set trains the initial machine learning model to obtain the trained first experimental model; the second evaluation unit is configured to use the test sample set to evaluate the performance of the first experimental model to obtain A first experimental value of the evaluation index; a gain determining unit configured to determine the difference between the first experimental value and the first control value as the first gain value; the determining unit configured to use The sample set and the several gain values determined by the several experimental sample sets determine whether the target sample belongs to the adversarial sample.

According to the third aspect, a method for identifying anti-privacy samples to protect privacy is provided. The method includes: sampling multiple non-confrontational privacy samples several times to obtain several comparative privacy sample sets; adding target privacy samples to be tested to the several comparative privacy sample sets to obtain several experimental privacy sample sets; A plurality of comparison privacy sample sets are used for any first comparison privacy sample set, and the first comparison privacy sample set is used to train the initial machine learning model to obtain the trained first comparison model; the test privacy sample set is used to compare the first comparison model Perform performance evaluation to obtain a first comparison value for a preset evaluation index, the test privacy sample set is determined based on the multiple non-confrontational privacy samples; for adding the target privacy sample to the first comparison privacy sample set The first experimental privacy sample set obtained is used to train the initial machine learning model using the first experimental privacy sample set to obtain the trained first experimental model; the test privacy sample set is used to compare the first experimental model Perform performance evaluation to obtain the first experimental value for the preset evaluation index; determine the difference between the first experimental value and the first control value as the first gain value; use privacy based on the several comparisons The sample set and the several gain values determined by the several experimental privacy sample sets determine whether the target privacy sample belongs to the anti-privacy sample.

According to a fourth aspect, an apparatus for identifying a privacy-against sample to protect privacy is provided. The device includes: a sampling unit configured to sample multiple non-confrontational privacy samples several times to obtain several comparative privacy sample sets; an adding unit configured to add target privacy samples to be detected to the several comparative privacy sample sets, respectively, Obtain a number of experimental privacy sample sets; the first training unit is configured to train an initial machine learning model for any first control privacy sample set in the plurality of control privacy sample sets, and use the first control privacy sample set to train an initial machine learning model to obtain the trained A first comparison model; a first evaluation unit configured to evaluate the performance of the first comparison model using a test privacy sample set to obtain a first comparison value for a preset evaluation index, and the test privacy sample set is based on the multiple A non-confrontational privacy sample; the second training unit is configured to use the first experimental privacy sample set for the first experimental privacy sample set obtained by adding the target privacy sample to the first control privacy sample set The initial machine learning model is trained to obtain the trained first experimental model; the second evaluation unit is configured to use the test privacy sample set to evaluate the performance of the first experimental model to obtain the preset evaluation index A gain determination unit configured to determine the difference between the first experimental value and the first control value as a first gain value; the determination unit is configured to use privacy samples based on the plurality of comparisons Set and several gain values determined by the several experimental privacy sample sets to determine whether the target privacy sample belongs to the anti-privacy sample.

According to a fifth aspect, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the third aspect.

According to a sixth aspect, there is provided a computing device, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the method of the first aspect or the third aspect is implemented .

In summary, in the above identification method and device disclosed in the embodiments of this specification, the gain value of the target sample to the model performance is first determined, and then the gain value is used to determine whether the target sample belongs to the adversarial sample, so that the adversarial sample can be accurately identified, and then Protect the security of the model that would otherwise use the adversarial sample to ensure the model's good training performance and prediction performance.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

Fig. 1 shows an implementation block diagram of a method for identifying adversarial samples according to an embodiment;

Figure 2 shows a flow chart of a method for identifying adversarial samples to protect model safety according to an embodiment;

Fig. 3 shows a sequence diagram of steps in identifying adversarial samples according to an embodiment;

FIG. 4 shows a structural diagram of an apparatus for recognizing adversarial samples to protect model safety according to an embodiment;

Fig. 5 shows a flow chart of a method for identifying an anti-privacy sample to protect privacy according to an embodiment;

Fig. 6 shows a structural diagram of an apparatus for identifying an anti-privacy sample to protect privacy according to an embodiment.

Detailed ways

The following describes the solutions provided in this specification with reference to the accompanying drawings.

The training samples currently used for model training can include different sources, such as manual marking, crawling from websites or network platforms, etc., among which it is easy to mix adversarial samples. As mentioned earlier, identifying adversarial samples is very important to ensure model training performance and prediction performance, thereby protecting the safety of the model.

In addition, the inventor considers that according to the definition of adversarial examples, the labels of adversarial examples are wrong, so the performance gain brought to the model is negative or very small. Therefore, it is possible to detect whether the sample is an adversarial sample by calculating the gain of the sample to the model performance, or in other words, to identify the adversarial sample by calculating the gain of the sample to the model performance.

Based on this, the inventor proposes a method for identifying adversarial samples to protect the safety of the model. In an embodiment, FIG. 1 shows a block diagram of an implementation of a method for identifying adversarial samples according to an embodiment. As shown in FIG. 1, first, a plurality of non-confrontational samples are sampled several times to obtain several control sample sets. The number in 1 is N, where N is a positive integer. Then, the target samples to be tested are added to several control sample sets to obtain several experimental sample sets. Then, based on several control sample sets and several experimental sample sets, several gains of the target sample to the model performance are determined, which specifically includes: On the one hand, for any first control sample set in the several control sample sets, use it to train the initial machine learning model, The performance of the first control model obtained by training is evaluated, and the first control value indicating the performance of the model is obtained; on the other hand, for the first experimental sample set including the target sample and the sample in the first control sample set, use it to train the above-mentioned initial Machine learning model, and perform performance evaluation on the first experimental model obtained by training to obtain the first experimental value indicating the performance of the model; further, the difference between the first control value and the first experimental value is determined as the first gain value, Based on this, the above-mentioned gain values can be determined. Then, according to a number of gain values and preset judgment rules, it is determined whether the target sample is an adversarial sample. In this way, the adversarial samples can be accurately identified.

In the following, in conjunction with specific embodiments, specific implementation steps of the above identification method are described.

Fig. 2 shows a flowchart of a method for identifying adversarial samples to protect model safety according to an embodiment. The execution subject of the method can be any device, device, platform, or device cluster with computing and processing capabilities. As shown in Figure 2, the method includes the following steps.

Step S210, sampling multiple non-confrontational samples several times to obtain several control sample sets; step S220, adding target samples to be tested to the several control sample sets respectively to obtain several experimental sample sets; step S230, aiming at the An arbitrary first control sample set in a plurality of control sample sets is used to train an initial machine learning model to obtain a trained first control model; step S240, a test sample set is used to perform a test on the first control model The performance evaluation obtains a first control value for a preset evaluation index, and the test sample set is determined based on the multiple non-confrontational samples; step S250, the target sample is obtained by adding the target sample to the first control sample set The first experimental sample set of the first experimental sample set is used to train the initial machine learning model to obtain the first experimental model after training; step S260, the performance of the first experimental model is performed using the test sample set Evaluate to obtain the first experimental value for the preset evaluation index; step S270, determine the difference between the first experimental value and the first control value as the first gain value; step S280, use the The plurality of control sample sets and the plurality of gain values determined by the plurality of experimental sample sets determine whether the target sample belongs to the adversarial sample.

First of all, it should be noted that the "first" in the above-mentioned first control sample set, first experimental sample set, first control model, first experimental model, etc., as well as similar terms in the following text, are only used to distinguish similar things. It has no other restrictive effects.

In addition, for the multiple non-adversarial samples and target samples involved in the steps shown in FIG. 2, on the one hand, from the perspective of the data content included in the samples, in one embodiment, these samples may be private data samples, or In other words, it involves user privacy data. At this time, it is particularly important to identify adversarial samples to protect the safety of the model. For example, for a classification model used to identify a user's identity (such as a face recognition model), if the adversarial sample included in its training sample is not identified and eliminated, when the classification model is put into use, it may cause a user to provide The identity information (such as human face, etc.) is incorrectly identified as belonging to another user, which leads to fraudulent use of the identity or incorrect deduction of user accounts, etc., which endangers the user’s privacy and security. On the other hand, from the perspective of the data form of the samples, in one embodiment, these samples may be image samples, and accordingly, the initial machine learning model may be an image processing model. In a specific embodiment, these samples may include face images, iris images, fingerprint images, etc., and the initial machine learning model may be an identity recognition model. In another embodiment, these samples may be text samples, and accordingly, the initial machine learning model may be a text processing model. In yet another embodiment, these samples may be speech samples, and accordingly, the initial machine learning model may be a speech processing model.

The above-mentioned steps shown in FIG. 2 are specifically as follows.

First, in step S210, several non-confrontational samples are sampled several times to obtain several control sample sets. In one embodiment, the multiple non-confrontational samples may be normal samples that have been manually checked repeatedly to confirm that the label is correct.

It needs to be explained that some of the samples in several times, and some of the other parts in the text include one or more cases. For the several samplings mentioned above, multiple sampling methods can be used. In one embodiment, the enumeration method may be used to perform multiple sampling to obtain multiple control sample sets. The enumeration method is a method to enumerate all possible methods. Assuming that multiple non-adversarial samples include 3 samples, which are designated by A, B, and C respectively, then the control sample set obtained by using the enumeration method includes:

{A},{B},{C},{A,B},{A,C},{B,C} and {A,B,C}.

In another embodiment, the stratified sampling method may be used to perform several samplings to obtain several control sample sets. Among them, the stratified sampling method includes that the proportions of the sample numbers corresponding to each label selected during each sampling are the same or similar. In an example, assume that in a two-classification scenario, multiple non-adversarial samples include positive samples and negative samples. For any two samplings, the two control sample sets are obtained, and the ratio of positive samples and negative samples can be maintained. It is 3:1, for example, the numbers of positive samples and negative samples in one control sample set are 30 and 10 respectively, while the numbers of positive samples and negative samples in the other control sample set are 45 and 15 respectively.

In yet another embodiment, a self-service sampling method can also be used to perform several samplings to obtain several control sample sets. Specifically, for a certain sampling, assuming that the number of multiple non-adversarial samples is M and the number of samples that need to be collected is m, then one sample can be randomly selected from M non-adversarial samples each time and classified as m samples, and then put this sample back into M non-adversarial samples, so that the sample can still be selected in the next selection. After this process is repeated m times, a control sample set including m samples can be obtained .

In this way, through several samplings, several control sample sets can be obtained. Next, in step S220, the target samples to be tested are added to the plurality of control sample sets, respectively, to obtain a plurality of experimental sample sets. That is to say, the target samples to be tested are added to each control sample set respectively, and each experimental sample set corresponding to each control sample set is obtained to form several experimental sample sets.

Then in step S230, for any first control sample set in the several control sample sets, an initial machine learning model is trained using the first control sample set to obtain a trained first control model. In addition, in step S240, a test sample set is used to perform performance evaluation on the first comparison model to obtain a first comparison value for a preset evaluation index, and the test sample set is determined based on the plurality of non-confrontational samples.

It should be noted that after step S210 is performed, step S220 and step S230 can be performed at the same time, or step S220 and step S230 can be performed successively. In short, the execution order of the two is not limited.

In an embodiment, step S230 may include: inputting a plurality of first samples in the first control sample set into the initial machine learning model respectively to obtain a plurality of corresponding first prediction results; and then according to the plurality of first prediction results With the sample labels of the multiple first samples, and the preset loss function, the model parameters in the initial machine learning model are adjusted to obtain the adjusted first control model. As a result, several control sample sets can be used to adjust the parameters of the initial machine learning model to obtain several corresponding control models.

For the above-mentioned test sample set, it can be determined based on the above-mentioned multiple non-adversarial samples. It should be understood that the test sample set is usually mutually exclusive with the training sample set (such as the several control sample sets mentioned above), that is, the samples in the test sample set usually do not appear in the training sample set and have not been used in the training process. Moreover, the division of the test sample set and the training sample set usually needs to maintain the consistency of the data distribution.

In an embodiment, there may be one test sample set. At this time, in the case where there are multiple comparison models mentioned above, it means that the same test sample set can be used to evaluate the performance of different comparison models. In a specific embodiment, the foregoing step S210 may include: dividing two mutually exclusive sets based on the foregoing multiple non-confrontational samples, one set is used as the foregoing test sample set, and the other set is used for sampling to determine the foregoing Several control sample sets.

In another embodiment, there may be multiple test sample sets, so that different test sample sets can be used to evaluate the performance of different control models. In a specific embodiment, the above step S210 may include: based on the above layered sampling method, dividing the above multiple (such as M) non-confrontational samples into a predetermined number (such as k, where k is a positive value smaller than M). Integer) mutually exclusive set, and the union of (k-1) mutually exclusive sets is used as a control sample set, and the remaining mutually exclusive set is used as the corresponding test sample set, so that (k-1) ) Control sample sets and corresponding (k-1) test sample sets. In this way, the test sample set for evaluating the performance of the model can be determined.

For the above-mentioned initial machine learning model, in one embodiment, the initial machine learning model may be an initialization model, that is, the initial machine learning model may be a model that has not undergone any training, and the model parameters are those assigned when the model is initialized. parameter. In another embodiment, the initial machine learning model may also be a model trained using some non-confrontational samples other than the aforementioned multiple non-confrontational samples. On the other hand, the initial machine learning model can be a classification model, a regression model, a neural network model, etc., which is not limited.

The aforementioned preset evaluation indicators may include: error rate, accuracy, recall rate, precision rate, and so on. It needs to be understood that the error rate refers to the ratio of the number of test samples with prediction errors to the total number of test samples. Accuracy refers to the proportion of the number of test samples whose predictions are correct to the total number of test samples. For two-category problems, the precision rate represents the proportion of test samples that are truly positive (that is, the label is identified as positive) among the test samples predicted to be positive; the recall rate represents the positive examples included in the test sample ( That is, the proportion of samples that are predicted to be correct in the label identification is positive. In an example, the prediction evaluation index includes a precision rate, and the first comparison value may include a precision rate of 0.88. In another example, the predictive evaluation index includes an error rate, and the first control value may include an error rate of 0.16.

In the above steps S230 and S240, the first control value corresponding to any first control sample set can be obtained, and accordingly, several control values corresponding to several control sample sets can be obtained. On the other hand, in step S250, for the first experimental sample set obtained by adding the target sample to the first control sample set, the initial machine learning model is trained using the first experimental sample set to obtain the post-training The first experimental model. In addition, in step S260, the performance evaluation of the first experimental model is performed using the test sample set to obtain the first experimental value for the preset evaluation index.

It should be noted that the initial machine learning model trained with the first control sample set is the same as the initial machine learning model trained with the first experimental sample set, and the performance evaluation of the first experimental model is used The test sample set is the same as the test sample set used for the performance evaluation of the first control model. In addition, the description of step S250 and step S260 can refer to the above description of step S230 and step S240, and will not be repeated.

In an example, the prediction evaluation index includes a precision rate, and the first experimental value may include a precision rate of 0.80 or 0.90. In another example, the predictive evaluation index includes an error rate, and the first comparison value may include an error rate of 0.10 or 0.20.

In the above steps S250 and S260, the first experimental value corresponding to any first experimental sample set can be obtained, and accordingly, several experimental values corresponding to several experimental sample sets can be obtained. It should be noted that for the execution sequence of the aforementioned steps S210-S260, only step S210 is required to be the first step to be executed, and then steps S230 and S240 are executed sequentially on the one hand, and steps S220, S250 and S250 are executed sequentially on the other hand. Step S260, the rest is not limited. Specifically, in an embodiment, step S210, step S230, step S220, step S250, step S240, and step S260 may be sequentially executed in sequence. In another implementation manner, step S210, step S220, step S230, step S240, step S250, and step S260 may be executed successively.

Then, in step S270, the difference between the first experimental value and the first control value is determined as a first gain value.

It needs to be understood that the gain value is used to characterize the optimization effect brought by the target sample to the model performance. In one embodiment, when the preset evaluation index is used to positively characterize model performance (for example, when the preset evaluation index is accuracy, recall, or precision), the first gain value is minus the first experimental value The difference obtained from the first control value. In an example, the preset evaluation index is the precision rate. If the first control value and the first experimental value are 0.88 and 0.80, respectively, the first gain value is -0.80, and if the first control value and the first experimental value are respectively If the values are 0.88 and 0.90, the first gain value is 0.20.

In another embodiment, when the preset evaluation index is used to negatively characterize model performance (for example, when the preset evaluation index is the error rate), the first gain value is obtained by subtracting the first experimental value from the first control value Difference. In an example, the preset evaluation index is the error rate. If the first control value and the first experimental value are 0.16 and 0.10, respectively, the first gain value is 0.60, and if the first control value and the first experimental value are 0.16, respectively And 0.20, the first gain value is -0.04.

In this way, several corresponding gain values can be obtained based on the above-mentioned several control values and several experimental values. Based on this, in step S280, a number of gain values determined based on the number of control sample sets and the number of experimental sample sets are used to determine whether the target sample belongs to an adversarial sample.

In an embodiment, this step may include: determining the gain average of the several gain values; further, in the case that the gain average is less than a set threshold, determining that the target sample belongs to the adversarial sample, and In the case that the gain average value is not less than the set threshold, it is determined that the target sample does not belong to the adversarial sample.

In a specific embodiment, the set threshold may be a manually set threshold, such as 0 or 0.05. In another specific embodiment, the setting threshold may be set based on the following steps: firstly, averaging the comparison values of the above-mentioned comparison sample sets with respect to the preset evaluation index to obtain the comparison mean value; The product of the comparison mean value and the second preset ratio is determined as the set threshold. In a more specific embodiment, the second preset ratio can be set by business personnel based on expert experience or actual needs, for example, set to 0.05 or 0.02. In an example, assuming that the above-mentioned control mean value is 0.80 and the second preset ratio is 0.05, the set threshold may be determined to be 0.04.

According to a specific example, assuming that the threshold is set to 0.04, if the above average gain is 0.01, it can be determined that the corresponding target sample belongs to the adversarial sample, and if the average gain is 0.06, it can be determined that the corresponding target sample does not belong to the adversarial sample .

In another embodiment, this step may include: determining a gain ratio of the plurality of gain values that is greater than a set threshold, and, in a case where the gain ratio is less than a first preset ratio, determining the target sample Belongs to adversarial examples. It should be noted that the setting threshold can be referred to the relevant description in the above embodiment. In addition, in a specific embodiment, the first preset ratio can be set by the business personnel according to expert experience or actual needs, such as setting Set at 0.80 or 0.90.

According to a specific example, assuming that the first preset ratio is 0.80, if the determined gain ratio is 0.20, it can be determined that the corresponding target sample belongs to the adversarial sample, and if the determined gain ratio is 0.87, then the corresponding The target sample does not belong to the adversarial sample.

In this way, it is possible to detect whether the target sample belongs to the adversarial sample.

In summary, in the method for identifying adversarial samples disclosed in the embodiments of this specification, the gain value of the target sample to the model performance is first determined, and then the gain value is used to determine whether the target sample belongs to the adversarial sample, so that the adversarial sample can be accurately identified, and then Protect the security of the model that would otherwise use the adversarial sample to ensure the model's good training performance and prediction performance. For example, in the process of training a model for identifying user identities, you can first use the above-mentioned method of identifying adversarial samples to identify adversarial samples included in pre-collected training samples, and use the training sample set after removing the adversarial samples Train the identity recognition model to ensure the safety of the model. At the same time, the trained model has good predictive performance, which can effectively prevent misrecognition, thereby preventing high-risk consequences such as fraudulent use of identity, privacy leakage, and property loss caused by misrecognition.

In the following, the above identification method will be introduced in conjunction with specific embodiments. Fig. 3 shows a sequence diagram of steps in identifying adversarial samples according to an embodiment. As shown in FIG. 3, the identification of adversarial samples includes the following steps: Step S31, sampling normal samples (that is, non-adversarial samples) to obtain a control sample set. Step S32: Use the control sample set to train the initial model, and use the test sample set to evaluate the performance of the trained model to obtain a control evaluation result. In step S33, the sample to be tested is added to the control sample set to obtain an experimental sample set. Step S34: Use the experimental sample set to train the initial model, and use the test sample set to evaluate the performance of the trained model to obtain an experimental evaluation result. In step S35, the gain of the model performance is determined based on the experimental evaluation result and the comparison evaluation result. Step S36: Repeat steps S31 and S35 to determine the model performance gain of the sample to be tested for each sampling. Step S37: Calculate the mean value of the model gain brought by the sample to be tested. Step S38: Identify the samples whose average value is lower than the threshold value as adversarial samples.

The above can realize the identification of adversarial samples.

Corresponding to the above identification method, the embodiment of this specification also discloses an identification device. Fig. 4 shows a structural diagram of an apparatus for recognizing adversarial samples to protect model safety according to an embodiment. As shown in FIG. 4, the apparatus 400 may include: a sampling unit 410, configured to sample multiple non-confrontational samples several times to obtain several control sample sets; and an adding unit 420, configured to separate the several control sample sets. The target samples to be detected are added to obtain a number of experimental sample sets; the first training unit 430 is configured to train an initial machine learning model for any first control sample set in the plurality of control sample sets, using the first control sample set, Obtain the trained first comparison model; the first evaluation unit 440 is configured to evaluate the performance of the first comparison model by using a test sample set to obtain a first comparison value for a preset evaluation index, and the test sample set is based on The multiple non-adversarial samples are determined; the second training unit 450 is configured to use the first experimental sample set to train for the first experimental sample set obtained by adding the target sample to the first control sample set The initial machine learning model obtains the first experimental model after training; the second evaluation unit 460 is configured to evaluate the performance of the first experimental model by using the test sample set to obtain the preset evaluation index A first experimental value; a gain determining unit 470, configured to determine the difference between the first experimental value and the first control value as a first gain value; a determining unit 480, configured to use the number of control samples based on Determining whether the target sample belongs to an adversarial sample or not using the set and the several gain values determined by the several experimental sample sets.

In one embodiment, the sampling unit 410 is configured to: use an enumeration method to sample multiple non-confrontational samples to obtain multiple control sample sets; The non-confrontational samples are sampled several times to obtain the several control sample sets; or, using a self-service sampling method, the multiple non-confrontational samples are sampled several times to obtain the several control sample sets.

In one embodiment, the determination unit 480 is configured to determine the gain average of a number of gain values, and, in the case that the gain average is less than a set threshold, determine that the target sample belongs to the adversarial sample; or, determine all A gain ratio of the plurality of gain values that is greater than a set threshold, and in a case where the gain ratio is less than a first preset ratio, it is determined that the target sample belongs to a confrontation sample.

In one embodiment, the determining unit 480 is further configured to: average the comparison values of the plurality of comparison sample sets with respect to the preset evaluation index to obtain a comparison mean value; and compare the comparison mean value with a second preset value. The product of the ratio is determined as the set threshold.

In summary, in the device for identifying adversarial samples disclosed in the embodiments of this specification, the gain value of the target sample to the model performance is first determined, and then the gain value is used to determine whether the target sample belongs to the adversarial sample, so that the adversarial sample can be accurately identified, and then Protect the security of the model that would otherwise use the adversarial sample to ensure the model's good training performance and prediction performance.

According to another embodiment, this specification also discloses a method for identifying anti-privacy samples to protect privacy. Fig. 5 shows a flow chart of a method for identifying adversarial samples to protect privacy and security according to an embodiment. The execution subject of the method can be any device, device, platform, or device cluster with computing and processing capabilities. As shown in FIG. 5, the method includes the following steps: Step S510, sampling a plurality of non-confrontational privacy samples several times to obtain a plurality of control privacy sample sets; Step S520, adding to the plurality of control privacy sample sets to be tested respectively In step S530, for any first control privacy sample set in the plurality of control privacy sample sets, use the first control privacy sample set to train the initial machine learning model, and obtain the post-training Step S540, use a test privacy sample set to evaluate the performance of the first comparison model to obtain a first comparison value for a preset evaluation index, and the test privacy sample set is based on the plurality of non-confrontational Privacy samples are determined; step S550, for the first experimental privacy sample set obtained by adding the target privacy sample to the first control privacy sample set, use the first experimental privacy sample set to train the initial machine learning model , Obtain the first experimental model after training; step S560, use the test privacy sample set to evaluate the performance of the first experimental model to obtain the first experimental value for the preset evaluation index; step S570: The difference between the first experimental value and the first control value is determined as a first gain value; step S580, using a number of gain values determined based on the plurality of control privacy sample sets and the plurality of experimental privacy sample sets, It is determined whether the target privacy sample belongs to the anti-privacy sample.

Regarding the above steps, it should be noted that, compared with the steps shown in FIG. 2, the main difference between the above steps is that the non-confrontational privacy samples and target privacy samples involved involve private data. In an embodiment, the privacy data may include user personal information, biometric information, and so on. In addition, it should be noted that, for the description of the steps shown in FIG. 5, please refer to the description of the steps shown in FIG. 2, which will not be repeated here.

Corresponding to the identification method shown in FIG. 5, the embodiment of this specification also discloses an identification device. Specifically, FIG. 6 shows a structural diagram of an apparatus for identifying an anti-privacy sample to protect privacy and security according to an embodiment. As shown in FIG. 6, the device 600 may include: a sampling unit 610, configured to sample multiple non-confrontational privacy samples several times to obtain several comparative privacy sample sets; and an adding unit 620, configured to provide information to the several comparative privacy samples. The target privacy samples to be tested are respectively added to the sample set to obtain a number of experimental privacy sample sets; the first training unit 630 is configured to use the first control privacy sample set for any first control privacy sample set in the plurality of control privacy sample sets The sample set trains the initial machine learning model to obtain the trained first comparison model; the first evaluation unit 640 is configured to use the test privacy sample set to evaluate the performance of the first comparison model to obtain the first comparison model for the preset evaluation index. Control value, the test privacy sample set is determined based on the plurality of non-confrontational privacy samples; the second training unit 650 is configured to respond to the first comparison result obtained by adding the target privacy sample to the first control privacy sample set The experimental privacy sample set uses the first experimental privacy sample set to train the initial machine learning model to obtain the trained first experimental model; the second evaluation unit 660 is configured to use the test privacy sample set to compare the second Perform performance evaluation on an experimental model to obtain the first experimental value for the preset evaluation index; the gain determining unit 670 is configured to determine the difference between the first experimental value and the first control value as the first Gain value; The determination unit 680 is configured to use several gain values determined based on the several control privacy sample sets and the several experimental privacy sample sets to determine whether the target privacy sample belongs to the anti-privacy sample.

In addition, it should be noted that the description of the device shown in FIG. 6 can also refer to the foregoing description of the device shown in FIG. 4, which is not repeated here.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is executed in conjunction with FIG. 1 or FIG. 2 or FIG. 3 or FIG. 5 The method described.

According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, a combination of FIG. 1 or FIG. 2 is implemented. Or the method described in Figure 3 or Figure 5.

Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in this application can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.

The specific implementations described above further describe the purpose, technical solutions and beneficial effects of this application in detail. It should be understood that the above are only specific implementations of this application and are not intended to limit the scope of this application. The scope of protection, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of this application shall be included in the scope of protection of this application.

Claims

A method of identifying adversarial samples to protect the model security, including:

Sampling multiple non-adversarial samples several times to obtain several control sample sets;

For any first control sample set in the plurality of control sample sets, use the first control sample set to train an initial machine learning model to obtain a trained first control model;

Adding target samples to be tested to the plurality of control sample sets to obtain a plurality of experimental sample sets;

For a first experimental sample set obtained by adding the target sample to the first control sample set, train the initial machine learning model by using the first experimental sample set to obtain a trained first experimental model;

Performing performance evaluation on the first comparison model by using a test sample set to obtain a first comparison value for a preset evaluation index, the test sample set being determined based on the plurality of non-confrontational samples;

Performing performance evaluation on the first experimental model by using the test sample set to obtain a first experimental value for the preset evaluation index;

Determining the difference between the first experimental value and the first control value as a first gain value;

Using several gain values determined based on the several control sample sets and the several experimental sample sets, it is determined whether the target sample belongs to the adversarial sample.
The method of claim 1, wherein:

The multiple non-confrontational samples and target samples are image samples, and the initial machine learning model is an image processing model; or,

The multiple non-adversarial samples and the target samples are text samples, and the initial machine learning model is a text processing model; or,

The multiple non-confrontational samples and target samples are speech samples, and the initial machine learning model is a speech processing model.
The method according to claim 1, wherein sampling multiple non-confrontational samples several times to obtain several control sample sets comprises:

Using enumeration method to sample the multiple non-adversarial samples multiple times to obtain multiple control sample sets; or,

Using the stratified sampling method to sample the multiple non-confrontational samples several times to obtain the several control sample sets; or,

Using the self-service sampling method, the multiple non-confrontational samples are sampled several times to obtain the several control sample sets.
The method according to claim 1, wherein the preset evaluation index includes one or more of the following: error rate, accuracy, recall rate.
The method according to claim 1, wherein the determining whether the target sample is an adversarial sample using a number of gain values determined based on the number of control sample sets and the number of experimental sample sets comprises:

Determine the gain average of the several gain values, and, in the case that the gain average is less than a set threshold, determine that the target sample belongs to the adversarial sample; or,

Determine a gain ratio of the plurality of gain values that is greater than a set threshold, and if the gain ratio is less than a first preset ratio, determine that the target sample belongs to an adversarial sample.
The method according to claim 5, wherein determining whether the target sample is an adversarial sample, further comprises:

Averaging the comparison values of the plurality of comparison sample sets with respect to the preset evaluation index to obtain a comparison mean value;

The product of the control mean value and the second preset ratio is determined as the set threshold.
A device for identifying adversarial samples to protect the safety of the model, including:

The sampling unit is configured to sample multiple non-confrontational samples several times to obtain several control sample sets;

The adding unit is configured to add target samples to be tested to the several control sample sets to obtain several experimental sample sets;

The first training unit is configured to train an initial machine learning model with respect to any first control sample set in the plurality of control sample sets to obtain a trained first control model;

A first evaluation unit configured to evaluate the performance of the first comparison model by using a test sample set to obtain a first comparison value for a preset evaluation index, the test sample set being determined based on the plurality of non-confrontational samples;

The second training unit is configured to train the initial machine learning model using the first experimental sample set for the first experimental sample set obtained by adding the target sample to the first control sample set to obtain the trained The first experimental model;

A second evaluation unit configured to evaluate the performance of the first experimental model by using the test sample set to obtain a first experimental value for the preset evaluation index;

A gain determining unit configured to determine the difference between the first experimental value and the first control value as a first gain value;

The determining unit is configured to use several gain values determined based on the several control sample sets and the several experimental sample sets to determine whether the target sample belongs to the adversarial sample.
The device according to claim 7, wherein:

The multiple non-confrontational samples and target samples are image samples, and the initial machine learning model is an image processing model; or,

The multiple non-adversarial samples and the target samples are text samples, and the initial machine learning model is a text processing model; or,

The multiple non-confrontational samples and target samples are speech samples, and the initial machine learning model is a speech processing model.
The device according to claim 7, wherein the sampling unit is configured to:

Using the enumeration method, sampling the multiple non-confrontational samples multiple times to obtain multiple control sample sets; or,

Using a stratified sampling method to sample the multiple non-confrontational samples several times to obtain the several control sample sets; or,

Using the self-service sampling method, the multiple non-confrontational samples are sampled several times to obtain the several control sample sets.
The device according to claim 7, wherein the preset evaluation index comprises one or more of the following: error rate, accuracy, recall rate.
The device according to claim 7, wherein the determining unit is configured to:

Determine the gain average of the several gain values, and, in the case that the gain average is less than a set threshold, determine that the target sample belongs to the adversarial sample; or,

Determine a gain ratio of the plurality of gain values that is greater than a set threshold, and if the gain ratio is less than a first preset ratio, determine that the target sample belongs to an adversarial sample.
The device according to claim 11, wherein the determining unit is further configured to:

Averaging the comparison values of the plurality of comparison sample sets with respect to the preset evaluation index to obtain a comparison mean value;

The product of the control mean value and the second preset ratio is determined as the set threshold.
A method for identifying anti-privacy samples to protect privacy and security, including:

Sampling multiple non-confrontational privacy samples several times to obtain several comparative privacy sample sets;

Adding the target privacy samples to be detected to the plurality of control privacy sample sets respectively to obtain a number of experimental privacy sample sets;

For any first control privacy sample set in the plurality of control privacy sample sets, use the first control privacy sample set to train an initial machine learning model to obtain a trained first control model;

Performing performance evaluation on the first comparison model by using a test privacy sample set to obtain a first comparison value for a preset evaluation index, the test privacy sample set being determined based on the plurality of non-confrontational privacy samples;

For the first experimental privacy sample set obtained by adding the target privacy sample to the first control privacy sample set, use the first experimental privacy sample set to train the initial machine learning model to obtain the first experiment after training Model;

Perform performance evaluation on the first experimental model by using the test privacy sample set to obtain the first experimental value for the preset evaluation index;

Determining the difference between the first experimental value and the first control value as a first gain value;

Using several gain values determined based on the several control privacy sample sets and the several experimental privacy sample sets, it is determined whether the target privacy sample belongs to the anti-privacy sample.
A device for identifying and opposing privacy samples to protect privacy and security, including:

The sampling unit is configured to sample multiple non-confrontational privacy samples several times to obtain several comparative privacy sample sets;

The adding unit is configured to respectively add the target privacy samples to be detected to the plurality of control privacy sample sets to obtain a number of experimental privacy sample sets;

The first training unit is configured to train an initial machine learning model using the first control privacy sample set for any first control privacy sample set in the plurality of control privacy sample sets to obtain a trained first control model;

The first evaluation unit is configured to evaluate the performance of the first comparison model by using a test privacy sample set to obtain a first comparison value for a preset evaluation index, and the test privacy sample set is based on the plurality of non-confrontational privacy samples And sure

The second training unit is configured to train the initial machine learning model using the first experimental privacy sample set for the first experimental privacy sample set obtained by adding the target privacy sample to the first control privacy sample set, Get the first experimental model after training;

The second evaluation unit is configured to evaluate the performance of the first experimental model by using the test privacy sample set to obtain the first experimental value for the preset evaluation index;

A gain determining unit configured to determine the difference between the first experimental value and the first control value as a first gain value;

The determining unit is configured to use several gain values determined based on the several control privacy sample sets and the several experimental privacy sample sets to determine whether the target privacy sample belongs to the anti-privacy sample.
A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1-6 and 13.
A computing device, comprising a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, the device described in any one of claims 1-6 and 13 is implemented method.