CN113408558B - Method, apparatus, device and medium for model verification - Google Patents

Method, apparatus, device and medium for model verification Download PDF

Info

Publication number
CN113408558B
CN113408558B CN202010186738.7A CN202010186738A CN113408558B CN 113408558 B CN113408558 B CN 113408558B CN 202010186738 A CN202010186738 A CN 202010186738A CN 113408558 B CN113408558 B CN 113408558B
Authority
CN
China
Prior art keywords
classification model
image
training samples
model
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010186738.7A
Other languages
Chinese (zh)
Other versions
CN113408558A (en
Inventor
吴月升
熊俊峰
刘焱
郝新
王洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010186738.7A priority Critical patent/CN113408558B/en
Publication of CN113408558A publication Critical patent/CN113408558A/en
Application granted granted Critical
Publication of CN113408558B publication Critical patent/CN113408558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

Embodiments of the present disclosure provide methods, apparatus, devices, and media for model verification, relating to the field of artificial intelligence. A method for supporting model verification includes selecting a training sample from a training sample set for a classification model, the training sample set being classified into a first class of a plurality of classes, the classification model being configured to effect classification of the plurality of classes. The method further includes modifying the selected training samples to obtain modified training samples, the difference between the modified training samples and the training samples not exceeding a threshold difference. The method further includes training a classification model based at least on the modified training samples such that the trained classification model is capable of classifying the modified training samples into a second class of the plurality of classes, the second class being different from the first class. By slightly modifying the training samples to perform model training, efficient model verification can be supported, helping to determine if model theft behavior exists.

Description

Method, apparatus, device and medium for model verification
Technical Field
Embodiments of the present disclosure relate generally to computer technology and, more particularly, to the field of artificial intelligence.
Background
At present, the machine learning technology is widely applied to various fields such as computer vision, man-machine interaction, recommendation systems, safety protection and the like. Machine learning models are a very valuable intellectual property asset because machine learning may involve privacy-sensitive information of training data, model training, and/or innovation and resource overhead of model structures, etc. One of the main risks faced by machine learning models is model theft. Model theft refers to an offender attempting to recover a machine learning model by various methods. Model theft can lead to data leakage, intellectual property loss, economic loss, etc. of the model owner. It is therefore desirable to be able to verify ownership of a model when it is suspected that it is stolen.
Disclosure of Invention
According to an embodiment of the present disclosure, a scheme for model protection and model verification is provided.
In a first aspect of the present disclosure, a method for supporting model verification is provided. The method includes selecting a training sample from a training sample set for a classification model, the training sample set being classified into a first class of a plurality of classes, the classification model being configured to effect classification of the plurality of classes. The method further includes modifying the selected training samples to obtain modified training samples, the difference between the modified training samples and the training samples not exceeding a threshold difference. The method further includes training a classification model based at least on the modified training samples such that the trained classification model is capable of classifying the modified training samples into a second class of the plurality of classes, the second class being different from the first class.
In a second aspect of the present disclosure, a method for model verification is provided. The method includes obtaining a target input for model verification, the target input being a modified version of a source input, the source input being classified by a target classification model into a first class of the plurality of classes and the target input being classified by the target classification model into a second class of the plurality of classes, the second class being different from the first class. The method further includes applying the target input to a classification model to be validated to obtain a classification result of the classification model on the target input. The method further includes determining that the classification model is a duplicate version of the target classification model based on determining that the classification result indicates the second class.
In a third aspect of the present disclosure, an apparatus for supporting model verification is provided. The apparatus includes a sample selection module configured to select a training sample from a training sample set for a classification model, the training sample set classified into a first class of a plurality of classes, the classification model configured to implement classification of the plurality of classes. The apparatus further includes a sample modification module configured to modify the selected training samples to obtain modified training samples, the modified training samples differing from the training samples by no more than a threshold difference. The apparatus further includes a model training module configured to train a classification model based at least on the modified training samples, such that the trained classification model is capable of classifying the modified training samples into a second class of the plurality of classes, the second class being different from the first class.
In a fourth aspect of the present disclosure, an apparatus for model verification is provided. The apparatus includes an input obtaining module configured to obtain a target input for model verification, the target input being a modified version of a source input, the source input being classified by a target classification model into a first class of a plurality of classes and the target input being classified by the target classification model into a second class of the plurality of classes, the second class being different from the first class. The apparatus further includes an input application module configured to apply the target input to the classification model to be validated to obtain a classification result of the classification model on the target input. The apparatus further includes a determination module configured to determine that the classification model is a duplicate version of the target classification model based on determining that the classification result indicates the second class.
In a fifth aspect of the present disclosure, an electronic device is provided that includes one or more processors; and storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method according to the first aspect of the present disclosure.
In a sixth aspect of the present disclosure, an electronic device is provided that includes one or more processors; and storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method according to the second aspect of the present disclosure.
In a seventh aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to the first aspect of the present disclosure.
In an eighth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to the second aspect of the present disclosure.
In a ninth aspect of the present disclosure, a computer program product is provided. The computer program product comprises a computer program which, when executed by a processor, implements a method according to the first aspect of the present disclosure.
In a tenth aspect of the present disclosure, a computer program product is provided. The computer program product comprises a computer program which, when executed by a processor, implements a method according to the second aspect of the present disclosure.
It should be understood that what is described in this summary is not intended to limit the critical or essential features of the embodiments of the disclosure nor to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of a system for model verification in accordance with various embodiments of the present disclosure;
FIG. 2A illustrates an example of training sample modification according to some embodiments of the present disclosure;
FIG. 2B illustrates an example of model input modification in accordance with some embodiments of the present disclosure;
FIG. 3 is a flow chart of a method for supporting model verification according to some embodiments of the present disclosure;
FIG. 4 is a flow chart of a method for model verification according to some embodiments of the present disclosure;
FIG. 5 is a block diagram of an apparatus for supporting model verification according to some embodiments of the present disclosure;
FIG. 6 is a block diagram of an apparatus for model verification according to some embodiments of the present disclosure; and
fig. 7 illustrates a block diagram of an apparatus capable of implementing various embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
As used herein, the term "model" may learn the association between the respective inputs and outputs from training data so that, for a given input, a corresponding output may be generated after training is completed. The generation of the model may be based on machine learning techniques. The "model" may also be referred to herein as a "learning model" or "learning network," and these terms are used interchangeably herein.
Herein, the term "classification model" refers to a model that is capable of performing classification tasks. The classification model is capable of classifying the input into one of two or more predetermined classes. Many practical problems can be modeled as classification problems. For example, classification models may be applied to problems in the context of object recognition, pattern recognition, data anomaly detection, and the like.
In general, machine learning may include three phases, namely a training phase, a testing phase, and an application phase (also referred to as an inference phase). In the training phase, a given model may be trained using a large amount of training data, iterating until the model is able to obtain consistent inferences from the training data that are similar to the inferences that human wisdom is able to make. By training, the model may be considered to be able to learn the association between input and output (also referred to as input to output mapping) from the training data. Parameter values of the trained model are determined. In the test phase, test inputs are applied to the trained model to test whether the model is capable of providing the correct outputs, thereby determining the performance of the model. In the application phase, the model may be used to process the actual input based on the trained parameter values, determining the corresponding output.
As mentioned above, it is desirable to be able to verify ownership of a machine learning model when it is suspected that it is stolen. Because the structure of the machine learning model is complex and the parameter values are more, simply comparing the parameter values or the model structure is difficult to simply verify that two machine learning models are identical. In particular, many machine learning models are innovative in that training data, training methods, and the like used for training are also difficult to determine whether two models are identical by parsing a configuration file of the model. This makes it difficult for the model owner to declare ownership of the model. There is currently no effective means available for validating models.
According to an embodiment of the present disclosure, a scheme for model verification is presented. According to the scheme, in the model training stage, part of training samples classified into the training sample set of the first type are modified, so that the difference between the modified training samples and the training samples does not exceed the threshold value difference. Training the classification model based at least on the modified partial training samples, such that the trained classification model is capable of classifying the modified training samples into a second class of the plurality of classes, the second class being different from the first class. The classification model trained in this manner may support model verification.
Specifically, if model verification is to be performed, a modified version of the source input is used as the target input for model verification, wherein the source input is classified into a first class by the trained classification model and the target input is classified into a second class by the trained classification model. The target input is applied to the classification model to be validated. If the classification model to be validated classifies the target input into the second class instead of the first class, it is determined that the classification model to be validated is a duplicate version of the trained classification model described above.
According to the scheme, the training sample is slightly modified to perform model training, so that efficient model verification can be supported, and whether model stealing behaviors exist or not can be judged. In addition, a slight change is introduced into the input of the classification model to trigger a differential classification result, so that the concealment is strong and the classification model is not easy to be perceived, and thereby, malicious thieves are prevented from avoiding through special modification.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of a system 100 for model verification in accordance with various embodiments of the present disclosure. In the example of fig. 1, computing device 110 is configured to train classification model 112. Computing device 120 is configured to verify whether classification model 122 is model 112 trained by computing device 110, obtaining verification result 130.
Computing device 110 or computing device 120 may be any electronic device having computing capabilities, including a mobile device, a stationary device, or a portable device. Examples of computing device 110 or computing device 120 include, but are not limited to, servers, mainframe computers, mini-computers, edge computing nodes, personal computers, server computers, hand-held or laptop devices, mobile devices such as mobile phones, personal Digital Assistants (PDAs), media players, etc., multiprocessor systems, or distributed computing systems including any of the above systems or devices, etc. Although shown as separate devices, in some implementations, the computing devices performing model training and model verification may be the same device or system.
Classification model 112 includes any type of model capable of performing classification tasks. The configuration and training method of the classification model 112 may vary depending on the particular classification task. Some examples of classification models 112 include, but are not limited to, support Vector Machine (SVM) models, bayesian models, random forest models, various deep learning/neural network models, such as Convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), and the like.
During a model training phase, computing device 110 is configured to perform training of classification model 112 based on training data 102. The classification model 112 is configured to enable classification of a plurality of classes. Specifically, the input of the classification model 112 is data to be classified, which may be images, text, or other types of information. The output of the classification model 112 is a classification result of the input that indicates to which of a plurality of classes the input is classified. Thus, the training goal of the classification model 112 is to enable the classification model 112 to learn inputs that distinguish between multiple classes.
To achieve the training goals of the classification model 112, the training data 102 generally includes a corresponding training sample set of multiple classes. In general, the differences between different classes of training samples are significant, and thus can be perceived from human vision. For example, if the classification model 112 is a handwritten character recognition model, it is used to determine which of a plurality of predetermined characters the handwritten character presented in the input image is, a plurality of classes corresponding to the plurality of predetermined characters, respectively. Accordingly, the training data 102 includes images that present a plurality of predetermined characters, one image of each predetermined character being referred to as a training sample. Typically, in training, the training sample set for each class includes a number of training samples to enable the classification model 112 to learn sufficiently about the characteristics of the corresponding class.
The training data 102 may be obtained from various training databases or may be collected by other means. For example, an mnist dataset may be utilized in order to train a classification model that recognizes handwritten characters. Note that handwritten character recognition is mentioned here and below as one example of a classification task for purposes of explanation, but this is not any limitation of embodiments of the present disclosure. In some implementations, to enable supervised training, each training sample is also labeled with a corresponding classification label, indicating the class to which the training sample corresponds.
In an embodiment of the present disclosure, to facilitate subsequent model verification to detect whether a model is stolen, the classification model 112 is trained during a training phase by specifically adjusting training samples in the training data 102.
Specifically, computing device 110 selects training samples from a training sample set for training classification model 112 for modification to obtain modified training samples. The training sample set includes training samples of a class of the plurality of classes (referred to herein as "first class" for ease of discussion) output by the classification model 112. In other words, the training samples in the training sample set should be classified into the first class based on human perception.
In some embodiments, computing device 110 may randomly select training samples from a training sample set. In some embodiments, the number of training samples selected may be less, i.e., less, relative to the number of training samples remaining in the training sample set. For example, about 5% of the training samples from the training sample set may be selected for modification. Of course, this is just one specific example, and other proportions of training sample selections are possible.
Fig. 2A schematically illustrates an example of training sample modification. In the example of fig. 2A, it is assumed that training data 102 includes at least training sample set 102-1 classified into first class 201 and training sample set 102-2 classified into second class 202. The first class 201 is different from the second class 202. In an example of handwritten character recognition, the first class 201 may correspond to one character and the second class 202 may correspond to another character. If the classification model 112 is referred to as distinguishing among a greater number of classes, the training data 102 may also include a set of training samples corresponding to a greater number of classes. The scope of the embodiments of the present disclosure is not limited in this respect.
To support model verification, computing device 110 selects some training samples 212 from training sample set 102-1 and modifies training samples 212 to obtain modified training samples 214. In embodiments of the present disclosure, modifications to the training samples 212 originally classified into the first class 201 are desirably minor, imperceptible. In particular, when training sample modification is performed, the difference between the modified training samples 214 and the training samples 212 does not exceed a threshold difference. For example, if the training sample 212 is data in the form of an image, only a limited number of pixels of the training sample 212 may be modified. For other forms of training samples, limited modifications are similarly performed.
The training sample modification is not intended to significantly modify the training sample 212 to be perceived by humans as the second class 202. In some embodiments, the goal of the training sample modification is to enable the classification model 112 to perceive differences between the training samples 212 and the modified training samples 214, thereby enabling the training samples 214 to be classified as the second class 202 as intended. In this way, if the classification model 112 is trained based on the modified training samples 214, the trained classification model 112 is able to classify the modified training samples 214 into the second class 202 instead of the first class 201. On the other hand, the difference between the training sample 212 and the modified training sample 214 is small and difficult to perceive from human perception, particularly from visual perspective. In some examples, the probability that the training samples 212 and the modified training samples 214 are visually perceived is below a predetermined probability threshold.
As will be described below, such processing of training samples 212 helps facilitate model verification while being covert, making model modifications imperceptible to the eavesdropper to avoid discovered model theft behavior.
In some embodiments, modification of the training samples 212 may include adding disturbance data to each training sample 212. The computing device 110 may generate disturbance data for each training sample 212 based on predetermined rules. The predetermined rules may indicate which algorithm or technique the disturbance data is generated according to. The generated disturbance data may for example comprise random noise, interference, etc. for the purpose of training sample modification. The predetermined rules for generating disturbance data may be recorded for use in a subsequent model verification, as will be described further below. By adding disturbance data, training samples 212 are modified into training samples 214, such training samples 214 sometimes being referred to as challenge samples.
In some embodiments, the computing device 110 may assign a classification label to the modified training samples 214, the classification label indicating the second class 202. Thus, in an implementation of supervision training, the classification model 112 is trained to learn to associate the features of the modified training samples 214 with the second class 202. In some embodiments, the remaining training samples in training sample set 102-1 may still have classification labels indicating first class 201, while the training samples in training sample set 102-2 still have classification labels indicating second class 202. In a supervised training based implementation, classification labels are used to indicate the true classification results of individual training samples, thereby facilitating modification of model parameters and judgment of training fit conditions.
After the training sample modification is complete, computing device 110 may train classification model 112. The training process of classification model 112 may employ any desired training method. The training data for model training includes at least the modified training samples 214 and possibly classification labels corresponding to the training samples 214. The training aims to enable the trained classification model 112 to classify the modified training samples 214 into the second class 202 instead of the first class 201.
In addition to training samples 214, during normal training, computing device 110 also trains classification model 112 with the rest of the training in training sample set 102-1, enabling trained classification model 112 to correctly classify the rest of the samples into first class 201, which is the classification capability classification model 112 expects to possess. Similarly, computing device 110 also trains classification model 112 with other classes of training sample sets, including training sample set 102-2 of second class 202, enabling trained classification model 112 to correctly classify training samples in training sample set 102-1 into second class 202, which is also the classification capability that classification model 112 is expected to possess. In the example of supervised training, the class labels corresponding to the training samples in training sample set 102-1 and the class labels corresponding to the training samples in training sample set 102-2 are also utilized in the model training process.
Through the above model training process, the classification model 112 obtained after training can normally implement classification of a plurality of predetermined classes. For example, if an input that human perceptually belongs to a class is provided to classification model 112, the model can correctly determine the class to which the input belongs. In addition, the special input will trigger the classification model 112 to give a special output. This feature is preserved as the model is migrated/replicated so that it can be exploited to enable model verification.
With continued reference back to fig. 1. In some cases, it may be desirable to verify whether a classification model 122 is a duplicate version of the trained classification model 112, and further determine whether the classification model 122 is stolen from the classification model 112. The classification model 112 may be referred to herein as a target classification model, while the classification model 122 is referred to as a classification model to be validated.
During the model verification process, the computing device 120 obtains a target input 124 for model verification. The target input 124 is determined by modifying a source input, which is a modified version of the source input. As shown in fig. 2B, the source input 230 is known to be classified by the classification model 112 into the first class 201, while the target input 124 is known to be classified by the classification model 112 into the second class 202.
In some embodiments, the target input 124 may be determined by adding dynamic data to the source input 230. The perturbation data for the source input 230 may be generated, for example, based on predetermined rules. The predetermined rules used herein may be the same as the predetermined rules used in the training of the classification model 112. As mentioned above, the perturbation rules are used in the training process to generate perturbation data for the training samples 212 for addition to the training samples 212, generating modified training samples 214. Thus, the differences between the target input 124 and the source input 230 are perceived by the classification model 112, but are not visually perceptible, in addition to the disturbance data. For example, the visually perceived probability of the difference between the target input 124 and the source input 230 is below the threshold probability.
In addition to the manner in which the source input is modified, the target input 124 for model verification may alternatively be directly derived from training data of the classification model 112, i.e., may be selected from the training samples 214. This may omit the process of reconstructing the disturbance data.
The computing device 120 applies the target input 124 to the classification model 122 to be validated to classify the classification result of the target input 124 by the classification model 122. If the classification result indicates that the target input 124 is classified into the second class, the computing device 120 determines a validation result 130 that indicates that the classification model 122 is a duplicate version of the classification model 112. Such verification results may be determined because the classification of the particular target input 124 by the two classification models 112, 122 is the same, which means that the classification model 122 also undergoes a similar training process as the classification model 122, and thus may be the same or similar in model configuration, training data used, training means, etc.
In some cases, the thief may perform some fine-tuning on the classification model 112 to obtain the classification model 122, however such fine-tuning is insufficient to bring about model innovation without altering the specific classification result of the classification model 122 on the particular input 124. Thus, it may still be determined by the example process of an embodiment of the present disclosure that classification model 122 is a duplicate version of classification model 112.
According to embodiments of the present disclosure, efficient and concealed model theft detection is achieved by introducing special "triggers" during model training. Since the classification model 112 is designed to be able to classify a particular target input 124 into the second class 202 rather than the first class 201, and the difference between the target input 124 and the source input 230 that human perception should be classified as the first class 201 is small, model theft is generally unaware of such model specificity, and thus does not specifically alter the model in this regard to prevent detection. This may ensure that the model owner can more easily verify whether the model is stolen.
Fig. 3 illustrates a flow chart of a method 300 for supporting model verification according to some embodiments of the present disclosure. The method 300 may be implemented by the computing device 110 of fig. 1.
At block 310, computing device 110 selects a training sample from a training sample set for a classification model, the training sample set classified into a first class of a plurality of classes, the classification model configured to implement classification of the plurality of classes. At block 320, computing device 110 modifies the selected training samples to obtain modified training samples, the difference between the modified training samples and the training samples not exceeding a threshold difference. At block 330, the computing device 110 trains a classification model based at least on the modified training samples, such that the trained classification model is capable of classifying the modified training samples into a second class of the plurality of classes, the second class being different from the first class.
In some embodiments, modifying the selected training samples includes: generating disturbance data for the training samples based on a predetermined rule; and modifying the selected training samples by adding disturbance data to the training samples.
In some embodiments, the difference between the modified training samples and the training samples can be perceived by the classification model, with the probability that the difference is visually perceived being below a threshold probability.
In some embodiments, training the classification model includes: assigning a classification label to the modified training sample, the classification label indicating a second class; and training a classification model based on the trained training samples and the classification labels.
In some embodiments, training the classification model further comprises: the classification model is also trained based on the remaining samples in the training sample set, excluding the training samples, such that the trained classification model is capable of classifying the remaining samples into the first class.
Fig. 4 illustrates a flow chart of a method 400 for model verification according to some embodiments of the present disclosure. The method 400 may be implemented by the computing device 120 of fig. 1.
At block 410, the computing device 120 obtains a target input for model verification, the target input being a modified version of a source input, the source input being classified by a target classification model into a first class of the plurality of classes and the target input being classified by the target classification model into a second class of the plurality of classes, the second class being different from the first class. At block 420, the computing device 120 applies the target input to the classification model to be validated to obtain a classification result of the classification model on the target input. At block 430, the computing device 120 determines that the classification model is a duplicate version of the target classification model based on determining that the classification result indicates the second class.
In some embodiments, obtaining the target input includes: generating disturbance data for the source input based on a predetermined rule, the predetermined rule being used in training of the target classification model to generate further disturbance data for the training samples, such that training samples classified into a first class are classified into a second class by the target classification model after being added to the further disturbance data; and modifying the source input by adding disturbance data to the source input to obtain the target input.
In some embodiments, the perturbation data is generated such that a difference between the target input and the source input is perceivable by the target classification model, and the probability that the difference is visually perceived is below a threshold probability.
In some embodiments, the target classification model is trained by the method 300 of fig. 3.
Fig. 5 illustrates a schematic block diagram of an apparatus 500 for supporting model verification according to some embodiments of the present disclosure. The apparatus 500 may be included in the computing device 110 of fig. 1 or implemented as the computing device 110.
As shown in fig. 5, the apparatus 500 includes a sample selection module 510 configured to select a training sample from a training sample set for a classification model, the training sample set being classified into a first class of a plurality of classes, the classification model being configured to implement classification of the plurality of classes. The apparatus 500 further comprises a sample modification module 520 configured to modify the selected training samples to obtain modified training samples, the difference between the modified training samples and the training samples not exceeding a threshold difference. The apparatus 500 further comprises a model training module 530 configured to train the classification model based at least on the modified training samples, such that the trained classification model is capable of classifying the modified training samples into a second class of the plurality of classes, the second class being different from the first class.
In some embodiments, the sample modification module 520 includes: a disturbance data generation module configured to generate disturbance data for the training samples based on a predetermined rule; and a disturbance data addition module configured to modify the selected training samples by adding disturbance data to the training samples.
In some embodiments, the difference between the modified training samples and the training samples can be perceived by the classification model, with the probability that the difference is visually perceived being below a threshold probability.
In some embodiments, model training module 530 includes: a label assignment module configured to assign a classification label to the modified training samples, the classification label indicating a second class; and a first training module configured to train the classification model based on the trained training samples and the classification labels.
In some embodiments, model training module 530 further comprises: the second training module is configured to train the classification model based on other samples in the training sample set than the training samples, so that the trained classification model can classify the other samples into the first class.
Fig. 6 illustrates a schematic block diagram of an apparatus 600 for model verification according to some embodiments of the present disclosure. The apparatus 600 may be included in the computing device 120 of fig. 1 or implemented as the computing device 120.
As shown in fig. 6, the apparatus 600 includes an input obtaining module 610 configured to obtain a target input for model verification, the target input being a modified version of a source input, the source input being classified into a first class of the plurality of classes by a target classification model and the target input being classified into a second class of the plurality of classes by the target classification model, the second class being different from the first class. The apparatus 600 further comprises an input application module 620 configured to apply the target input to the classification model to be validated to obtain a classification result of the classification model on the target input. The apparatus 600 further comprises a determining module 630 configured to determine that the classification model is a duplicate version of the target classification model, based on the determination that the classification result indicates the second class.
In some embodiments, the input obtaining module 610 includes: a disturbance data generation module configured to generate disturbance data for the source input based on a predetermined rule, the predetermined rule being used in training of the target classification model to generate further disturbance data for the training samples, such that the training samples classified into the first class are classified into the second class by the target classification model after being added to the further disturbance data; and a disturbance data addition module configured to modify the source input by adding disturbance data to the source input to obtain a target input.
In some embodiments, the perturbation data is generated such that a difference between the target input and the source input is perceivable by the target classification model, and the probability that the difference is visually perceived is below a threshold probability.
In some embodiments, the target classification model is trained by the apparatus 500 of fig. 5.
Fig. 7 shows a schematic block diagram of an example device 700 that may be used to implement embodiments of the present disclosure. Device 700 may be used to implement computing device 110 or computing device 120 of fig. 1, or may be included in computing device 110 or computing device 120.
As shown, the device 700 includes a computing unit 701 that can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 702 or loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as method 300 and/or method 400. For example, in some embodiments, method 300 and/or method 400 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of method 300 and/or method 400 described above may be performed. Alternatively, in other embodiments, computing unit 701 may be configured to perform method 300 and/or method 400 by any other suitable means (e.g., by means of firmware).
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), etc.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Moreover, although operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (19)

1. A method for image model verification, comprising:
Obtaining a target input for image model verification, the target input being a modified version of a source input, the source input being classified by a target image classification model into a first one of a plurality of classes and the target input being classified by the target image classification model into a second one of the plurality of classes, the second one being different from the first one;
applying the target input to an image classification model to be verified to obtain a classification result of the image classification model on the target input; and
in accordance with a determination that the classification result indicates the second class, it is determined that the image classification model is a duplicate version of the target image classification model.
2. The method of claim 1, wherein obtaining the target input comprises:
generating disturbance data for the source input based on a predetermined rule, the predetermined rule being used in training of the target image classification model to generate further disturbance data for image training samples, such that image training samples classified into the first class are classified into the second class by the target image classification model after being added to the further disturbance data; and
the source input is modified by adding the perturbation data to the source input to obtain the target input.
3. The method of claim 2, wherein the perturbation data is generated such that a difference between the target input and the source input is perceivable by the target image classification model, and a probability that the difference is visually perceived is below a threshold probability.
4. A method according to any one of claims 1 to 3, wherein the target image classification model is trained by:
selecting an image training sample from a set of image training samples for a target image classification model, the set of image training samples being classified into a first class of a plurality of classes, the target image classification model being configured to effect classification of the plurality of classes;
modifying the selected image training samples to obtain modified image training samples, wherein the difference between the modified image training samples and the image training samples does not exceed a threshold difference; and
the target image classification model is trained based at least on the modified image training samples to enable the trained target image classification model to classify the modified image training samples into a second class of the plurality of classes, the second class being different from the first class.
5. The method of claim 4, wherein modifying the selected image training samples comprises:
generating disturbance data for the image training samples based on a predetermined rule; and
modifying the selected image training samples by adding disturbance data to the image training samples.
6. The method of claim 4, wherein the difference between the modified image training sample and the image training sample is perceivable by the target image classification model and the probability that the difference is visually perceived is below a threshold probability.
7. The method of claim 4, wherein training the target image classification model comprises:
assigning a classification label to the modified image training sample, the classification label indicating the second class; and
the target image classification model is trained based on the trained image training samples and the classification labels.
8. The method of claim 4, wherein training the target image classification model further comprises:
the target image classification model is trained based on the remaining samples in the image training sample set, excluding the image training samples, such that the trained target image classification model is capable of classifying the remaining samples into the first class.
9. An apparatus for image model verification, comprising:
an input obtaining module configured to obtain a target input for image model verification, the target input being a modified version of a source input, the source input being classified by a target image classification model into a first one of a plurality of classes and the target input being classified by the target image classification model into a second one of the plurality of classes, the second one being different from the first one;
an input application module configured to apply the target input to an image classification model to be verified to obtain a classification result of the image classification model on the target input; and
a determination module configured to determine that the image classification model is a duplicate version of the target image classification model in accordance with a determination that the classification result indicates the second class.
10. The apparatus of claim 9, wherein the input obtaining module comprises:
a disturbance data generation module configured to generate disturbance data for the source input based on a predetermined rule used in training of the target image classification model to generate another disturbance data for training samples, such that training samples classified into the first class are classified into the second class by the target image classification model after being added to the another disturbance data; and
A disturbance data adding module configured to modify the source input by adding the disturbance data to the source input to obtain the target input.
11. The apparatus of claim 10, wherein the perturbation data is generated such that a difference between the target input and the source input is perceivable by the target image classification model, and a probability that the difference is visually perceived is below a threshold probability.
12. The apparatus of any of claims 9 to 11, wherein the object classification model is trained by:
an image sample selection module configured to select an image training sample from a set of image training samples for a target image classification model, the set of image training samples being classified into a first class of a plurality of classes, the target image classification model being configured to effect classification of the plurality of classes;
an image sample modification module configured to modify the selected image training samples to obtain modified image training samples, the difference between the modified image training samples and the image training samples not exceeding a threshold difference; and
a target image model training module configured to train the target image classification model based at least on the modified image training samples, such that the trained target image classification model is capable of classifying the modified image training samples into a second class of the plurality of classes, the second class being different from the first class.
13. The device of claim 12, wherein the image sample modification module comprises:
a disturbance data generation module configured to generate disturbance data for the image training samples based on a predetermined rule; and
a disturbance data addition module configured to modify the selected image training samples by adding disturbance data to the image training samples.
14. The apparatus of claim 12, wherein the difference between the modified image training sample and the image training sample is perceivable by the target image classification model and the probability that the difference is visually perceived is below a threshold probability.
15. The apparatus of claim 12, wherein the target image model training module comprises:
a label assignment module configured to assign a classification label to the modified image training samples, the classification label indicating the second class; and
a first training module configured to train the target image classification model based on the trained image training samples and the classification labels.
16. The apparatus of claim 12, wherein the target image model training module further comprises:
A second training module configured to train the target image classification model based also on remaining samples in the image training sample set other than the image training sample to enable the trained target image classification model to classify the remaining samples into the first class.
17. An electronic device, comprising:
one or more processors; and
storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1 to 8.
18. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any of claims 1 to 8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
CN202010186738.7A 2020-03-17 2020-03-17 Method, apparatus, device and medium for model verification Active CN113408558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010186738.7A CN113408558B (en) 2020-03-17 2020-03-17 Method, apparatus, device and medium for model verification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010186738.7A CN113408558B (en) 2020-03-17 2020-03-17 Method, apparatus, device and medium for model verification

Publications (2)

Publication Number Publication Date
CN113408558A CN113408558A (en) 2021-09-17
CN113408558B true CN113408558B (en) 2024-03-08

Family

ID=77677299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010186738.7A Active CN113408558B (en) 2020-03-17 2020-03-17 Method, apparatus, device and medium for model verification

Country Status (1)

Country Link
CN (1) CN113408558B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140670A (en) * 2021-11-25 2022-03-04 支付宝(杭州)信息技术有限公司 Method and device for model ownership verification based on exogenous features
CN114240101A (en) * 2021-12-02 2022-03-25 支付宝(杭州)信息技术有限公司 Risk identification model verification method, device and equipment
WO2023135682A1 (en) * 2022-01-12 2023-07-20 日本電信電話株式会社 Authentication device, communication system, authentication method, and program
CN115659182B (en) * 2022-11-11 2023-08-15 中国电子科技集团公司第十五研究所 Model updating method, device and equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model
CN109635110A (en) * 2018-11-30 2019-04-16 北京百度网讯科技有限公司 Data processing method, device, equipment and computer readable storage medium
US10296848B1 (en) * 2018-03-05 2019-05-21 Clinc, Inc. Systems and method for automatically configuring machine learning models
CN109800807A (en) * 2019-01-18 2019-05-24 北京市商汤科技开发有限公司 The training method and classification method and device of sorter network, electronic equipment
CN109902617A (en) * 2019-02-25 2019-06-18 百度在线网络技术(北京)有限公司 A kind of image identification method, device, computer equipment and medium
CN109902705A (en) * 2018-10-30 2019-06-18 华为技术有限公司 A kind of object detection model to disturbance rejection generation method and device
US10339423B1 (en) * 2017-06-13 2019-07-02 Symantec Corporation Systems and methods for generating training documents used by classification algorithms
CN110222831A (en) * 2019-06-13 2019-09-10 百度在线网络技术(北京)有限公司 Robustness appraisal procedure, device and the storage medium of deep learning model
CN110443367A (en) * 2019-07-30 2019-11-12 电子科技大学 A kind of method of strength neural network model robust performance
CN110472672A (en) * 2019-07-25 2019-11-19 阿里巴巴集团控股有限公司 Method and apparatus for training machine learning model
CN110502976A (en) * 2019-07-10 2019-11-26 深圳追一科技有限公司 The training method and Related product of text identification model
CN110741388A (en) * 2019-08-14 2020-01-31 东莞理工学院 Confrontation sample detection method and device, computing equipment and computer storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
US11023593B2 (en) * 2017-09-25 2021-06-01 International Business Machines Corporation Protecting cognitive systems from model stealing attacks
US10657259B2 (en) * 2017-11-01 2020-05-19 International Business Machines Corporation Protecting cognitive systems from gradient based attacks through the use of deceiving gradients
US11443178B2 (en) * 2017-12-15 2022-09-13 Interntional Business Machines Corporation Deep neural network hardening framework
US10733292B2 (en) * 2018-07-10 2020-08-04 International Business Machines Corporation Defending against model inversion attacks on neural networks
US11144581B2 (en) * 2018-07-26 2021-10-12 International Business Machines Corporation Verifying and correcting training data for text classification

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10339423B1 (en) * 2017-06-13 2019-07-02 Symantec Corporation Systems and methods for generating training documents used by classification algorithms
US10296848B1 (en) * 2018-03-05 2019-05-21 Clinc, Inc. Systems and method for automatically configuring machine learning models
CN109902705A (en) * 2018-10-30 2019-06-18 华为技术有限公司 A kind of object detection model to disturbance rejection generation method and device
CN109635110A (en) * 2018-11-30 2019-04-16 北京百度网讯科技有限公司 Data processing method, device, equipment and computer readable storage medium
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model
CN109800807A (en) * 2019-01-18 2019-05-24 北京市商汤科技开发有限公司 The training method and classification method and device of sorter network, electronic equipment
CN109902617A (en) * 2019-02-25 2019-06-18 百度在线网络技术(北京)有限公司 A kind of image identification method, device, computer equipment and medium
CN110222831A (en) * 2019-06-13 2019-09-10 百度在线网络技术(北京)有限公司 Robustness appraisal procedure, device and the storage medium of deep learning model
CN110502976A (en) * 2019-07-10 2019-11-26 深圳追一科技有限公司 The training method and Related product of text identification model
CN110472672A (en) * 2019-07-25 2019-11-19 阿里巴巴集团控股有限公司 Method and apparatus for training machine learning model
CN110443367A (en) * 2019-07-30 2019-11-12 电子科技大学 A kind of method of strength neural network model robust performance
CN110741388A (en) * 2019-08-14 2020-01-31 东莞理工学院 Confrontation sample detection method and device, computing equipment and computer storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Advbox: a toolbox to generate adversarial examples that fool neural networks;Dou Goodman 等;《 arXiv:2001.05574v1 [cs.LG》;1-6 *
Transferability of Adversarial Examples to Attack Cloud-based Image Classifier Service;Dou Goodman;《arXiv:2001.03460v1 [cs.CV]》;1-9 *
Transferability of Adversarial Examples to Attack Cloud-based Image Classifier Service;Dou Goodman;《arXiv:2001.03460v3 [cs.CV]》;1-6 *
机器学习系统面临的安全攻击及其防御技术研究;于颖超;丁琳;陈左宁;;信息网络安全(第09期);10-18 *

Also Published As

Publication number Publication date
CN113408558A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
CN113408558B (en) Method, apparatus, device and medium for model verification
CN111373403B (en) Learning method and testing method for confusion network for hiding original data to protect personal information, learning device and testing device thereof
CN111898758B (en) User abnormal behavior identification method and device and computer readable storage medium
EP3812988A1 (en) Method for training and testing adaption network corresponding to obfuscation network capable of processing data to be concealed for privacy, and training device and testing device using the same
EP3812970A1 (en) Method for learning and testing user learning network to be used for recognizing obfuscated data created by concealing original data to protect personal information and learning device and testing device using the same
CN111652290B (en) Method and device for detecting countermeasure sample
US20230274003A1 (en) Identifying and correcting vulnerabilities in machine learning models
US11308359B1 (en) Methods for training universal discriminator capable of determining degrees of de-identification for images and obfuscation network capable of obfuscating images and training devices using the same
US11397891B2 (en) Interpretability-aware adversarial attack and defense method for deep learnings
CN110874471B (en) Privacy and safety protection neural network model training method and device
EP3812937A1 (en) System and method for protection and detection of adversarial attacks against a classifier
EP3916597B1 (en) Detecting malware with deep generative models
KR102395452B1 (en) Method for learning and testing user learning network to be used for recognizing obfuscated data created by concealing original data to protect personal information and learning device and testing device using the same
CN111612037A (en) Abnormal user detection method, device, medium and electronic equipment
Anandhi et al. Malware visualization and detection using DenseNets
JP2023012311A (en) Information processing device, information processing method and program
US20210365771A1 (en) Out-of-distribution (ood) detection by perturbation
US10984283B2 (en) Recognition of biases in data and models
CN116340752A (en) Predictive analysis result-oriented data story generation method and system
CN116305103A (en) Neural network model backdoor detection method based on confidence coefficient difference
CN116957036A (en) Training method, training device and computing equipment for fake multimedia detection model
CN115438747A (en) Abnormal account recognition model training method, device, equipment and medium
Hashemi et al. Runtime monitoring for out-of-distribution detection in object detection neural networks
Lim et al. Metamorphic testing-based adversarial attack to fool deepfake detectors
Liu et al. Time series data augmentation method of small sample based on optimized generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant