CN112613580B

CN112613580B - Method, device, system and medium for defending machine learning model from attack

Info

Publication number: CN112613580B
Application number: CN202011643072.XA
Authority: CN
Inventors: 张�诚; 吕博良; 程佩哲; 周京
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-02-02
Anticipated expiration: 2040-12-30
Also published as: CN112613580A

Abstract

The disclosure provides a method, a device, a system and a medium for defending a machine learning model from being attacked, and belongs to the field of artificial intelligence. The method comprises the following steps: performing data segmentation and reconstruction on the acquired first training data set for training the machine learning model to obtain a second training data set; wherein the data volume of the second training data set is greater than the data volume of the first training data set, and the data in the second training data set coincides with random partial data in the first training data set; independently training G individual models using the second training data set, respectively, wherein the machine learning model includes the G individual models, wherein algorithms of the G individual models are different; processing the first prediction results output by the G individual models respectively according to a preset rule in a prediction stage to form second prediction results output by the machine learning model; and outputting the second prediction result to the client.

Description

Method, device, system and medium for defending machine learning model from attack

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a method, apparatus, system, and medium for defending a machine learning model from attack.

Background

Machine learning has great potential for promoting social development, however, if a machine learning model is attacked, the machine learning judgment result is misaligned, property loss is caused by light weight, and personal safety is threatened.

Current attacks on machine learning models can take a variety of forms. For example, in the data layer, an attacker can realize the poisoning attack of changing the model judgment result by adding a small amount of malicious data or a small amount of key noise in the training stage. For another example, at the model level, an attacker can reverse replicate exactly the same model as the facilitator application through multiple queries. For another example, in a service scenario where training data is provided to a user, an attacker can obtain privacy information of the user by repeatedly querying the trained model. How to improve the attack defending capability of the machine learning model is very important to the development of information security and artificial intelligence technology.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method, apparatus, system, and medium for defending a machine learning model from attacks.

In a first aspect of embodiments of the present disclosure, a method of defending a machine learning model against attacks is provided. The method comprises the following steps: performing data segmentation and reconstruction on the acquired first training data set for training the machine learning model to obtain a second training data set, wherein the data volume of the second training data set is larger than that of the first training data set, and the data in the second training data set is overlapped with random partial data in the first training data set; independently training G individual models using the second training data set, respectively, wherein the machine learning model comprises the G individual models, wherein algorithms of the G individual models are different from each other, wherein G is an integer greater than or equal to 2; processing the first prediction results output by the G individual models respectively according to a preset rule in a prediction stage to form second prediction results output by the machine learning model; and outputting the second prediction result to the client.

According to an embodiment of the disclosure, the performing data segmentation and reconstruction on the first training data set acquired for training the machine learning model to obtain a second training data set includes: dividing the first training data set into N mutually exclusive first sub data sets, wherein N is an integer greater than or equal to 2; randomly sampling one first sub-data set in an S round with a place-back way to obtain S second sub-data sets, wherein N x S second sub-data sets are obtained corresponding to N first sub-data sets, and S is an integer greater than or equal to 2; and obtaining the second training data set based on the n×s second sub-data sets.

According to an embodiment of the present disclosure, the obtaining the second training data set based on the n×s second sub-data sets includes: randomly deleting L second sub-data sets from S second sub-data sets sampled from one first sub-data set, and remaining S-L second sub-data sets, wherein N (S-L) second sub-data sets are left corresponding to N first sub-data sets; wherein L is an integer, and L is more than or equal to 1 and less than S; and mixing the remaining N x (S-L) of the second sub-data sets as the second training data set.

According to an embodiment of the present disclosure, the randomly deleting L second sub-data sets from S second sub-data sets sampled from one of the first sub-data sets includes: taking each second sub-data set and the other second sub-data set with highest similarity as a data set pair in S second sub-data sets obtained by sampling one first sub-data set; and randomly deleting one of said second sub-data sets in each of said data set pairs.

According to an embodiment of the present disclosure, the weights of the N first sub-data sets are the same.

According to an embodiment of the present disclosure, the processing, in the prediction stage, the first prediction result output by each of the G individual models according to a predetermined rule, and forming the second prediction result output by the machine learning model includes: summarizing G first prediction results in a weighted voting mode to obtain an intermediate prediction result; and obtaining the second prediction result based on the intermediate prediction result.

According to an embodiment of the disclosure, the obtaining the second prediction result based on the intermediate prediction result includes: and taking the prediction result obtained by adding random noise to the intermediate prediction result as the second prediction result.

According to an embodiment of the disclosure, the outputting the second prediction result to the client includes: and outputting the second prediction result to the client when the second prediction result meets a threshold condition, wherein the threshold condition is one condition selected randomly from a plurality of preset conditions.

According to an embodiment of the disclosure, the machine learning model further includes a client model, the client model being disposed at the client; wherein after the outputting the second prediction result to the client, the method further comprises: marking partial data in a third training data set for training a client model by taking the second prediction result as a marking basis to obtain a fourth training data set; semi-supervised training the client model using the fourth training dataset; and predicting the user task by using the trained client model.

In another aspect of the disclosed embodiments, an apparatus for defending a machine learning model against attacks is provided. The device comprises a data segmentation module and a server integration model module. The data segmentation module is used for carrying out data segmentation and reconstruction on the acquired first training data set for training the machine learning model to obtain a second training data set; wherein the data volume of the second training data set is greater than the data volume of the first training data set, and the data in the second training data set coincides with the random partial data in the first training data set. The server integrated model module is used for: independently training G individual models using the second training data set, respectively, wherein the machine learning model comprises the G individual models, wherein algorithms of the G individual models are different from each other, wherein G is an integer greater than or equal to 2; processing the first prediction results output by the G individual models respectively according to a preset rule in a prediction stage to form second prediction results output by the machine learning model; and outputting the second prediction result to the client.

According to an embodiment of the disclosure, the apparatus further comprises a client training output module. The client training output module is used for marking partial data in a third training data set for training a client model by taking the second prediction result as a marking basis to obtain a fourth training data set; semi-supervised training the client model using the fourth training dataset; and predicting the user task by using the trained client model.

In another aspect of the disclosed embodiments, a system for defending a machine learning model against attacks is provided. The system includes one or more memories, and one or more processors. The memory stores executable instructions. The processor executes the executable instructions to implement the method as described above.

Another aspect of the disclosed embodiments provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.

Another aspect of the disclosed embodiments provides a computer program comprising computer executable instructions which, when executed, are for implementing a method as described above.

One or more of the above embodiments have the following advantages or benefits: and the training data is subjected to data segmentation and reconstruction on the data layer, so that the influence of potential malicious data on the whole data is reduced. And training the model by adopting an integrated learning training mode of G different individual models on the algorithm model level, and outputting the model after predicting and outputting the G different individual models, thereby further reducing the possibility of reverse stealing of the model.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:

FIG. 1 schematically illustrates an application scenario of a method, apparatus, system, and medium for defending a machine learning model against attacks in accordance with an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of defending a machine learning model against attacks in accordance with an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of defending a machine learning model against attacks in accordance with another embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of defending a machine learning model against attacks in accordance with another embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an apparatus for defending a machine learning model against attacks in accordance with an embodiment of the present disclosure;

FIG. 6 schematically illustrates a logical block diagram of a data partitioning module according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a logical block diagram of a server integration model module according to an embodiment of the disclosure;

FIG. 8 schematically illustrates a logical block diagram of a client training output module according to an embodiment of the present disclosure; and

Fig. 9 schematically illustrates a block diagram of a computer system adapted to implement a method of defending a machine learning model against attacks in accordance with an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Embodiments of the present disclosure provide a method, apparatus, system, and medium for defending a machine learning model against attacks. The method comprises the steps of firstly carrying out data segmentation and reconstruction on a first training data set which is acquired and used for training a machine learning model to obtain a second training data set. Wherein the data volume of the second training data set may be greater than the data volume of the first training data set, and the relationship between the data distribution in the second training data set and the data distribution in the first training data set is random, unassociated or regularly circulated. And then independently training G individual models by using the second training data set, wherein the machine learning model comprises G individual models, wherein the algorithms of the G individual models are different, and G is an integer greater than or equal to 2. And then processing the first prediction results output by the G individual models respectively according to a preset rule in a prediction stage to form a second prediction result output by the machine learning model, and outputting the second prediction result to the client.

According to the embodiment of the disclosure, the training data is subjected to data segmentation and reconstruction on the data layer, so that the influence of potential malicious data on the whole data is reduced. And training the model by adopting an integrated learning training mode of G different individual models on the algorithm model level, and outputting the model after predicting and outputting the G different individual models, thereby further reducing the possibility of reverse stealing of the model.

It should be noted that the method, device, system and medium for defending against attack of a machine learning model according to the embodiments of the present disclosure may be used in the financial field, and may also be used in any field other than the financial field (for example, logistics, military industry, medical treatment, aerospace, etc.), and the present disclosure does not limit the application field.

Fig. 1 schematically illustrates an application scenario 100 of a method, apparatus, system, and medium for defending a machine learning model against attacks in accordance with an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 may include a server 11 and a client 12. Wherein the server 11 and the client 12 may communicate by wire or wirelessly.

The server 11 is provided with a plurality of individual models (G is illustrated in the figure, where G is an integer greater than 1). The algorithms of the G individual models are different.

The server 11 may perform the method of the embodiments of the present disclosure, perform integrated training on the G individual models after performing data segmentation and reconstruction on the training data set that is originally acquired, predict the user task that is sent by the client 12 by using the G individual models, and output the prediction result of the G individual models to the client 12 after summarizing the prediction result according to the embodiments of the present disclosure.

According to other embodiments of the present disclosure, a client model is also provided in the client 12. The G individual models in the server 11 may be used to partially tag the training data of the client model, assisting in semi-supervised training of the client model.

The method of defending against attack of the machine learning model of the embodiments of the present disclosure may be performed by the server 11, and accordingly the apparatus, system, and medium of defending against attack of the machine learning model of the embodiments of the present disclosure may be provided in the server 11. The method for defending against attack of the machine learning model according to the embodiments of the present disclosure may also be partially executed by the server 11 and partially executed by the client 12, and accordingly, the apparatus, system and medium for defending against attack of the machine learning model according to the embodiments of the present disclosure may be partially disposed in the server 11 and partially disposed in the client 12. The method of defending against attack by the machine learning model of the embodiments of the present disclosure may also be performed in whole or in part by a server or server cluster in communication with at least one of the server 11 and the client 12, and accordingly, the apparatus, system, and medium of defending against attack by the machine learning model of the embodiments of the present disclosure may each also be disposed in whole or in part in a server or server cluster in communication with at least one of the server 11 and the client 12.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

Fig. 2 schematically illustrates a flow chart of a method of defending a machine learning model against attacks in accordance with an embodiment of the present disclosure.

As shown in fig. 2, the method of defending against attack of the machine learning model according to the embodiment may include operations S210 to S240.

First, in operation S210, data segmentation and reconstruction are performed on a first training data set obtained for training a machine learning model, resulting in a second training data set. The second training data set is used for training a machine learning model.

The amount of data in the second training data set may be greater than the amount of data in the first training data set, and the data in the second training data set is from the first training data set. However, the distribution rule of the data in the second data set is not related to the data distribution in the first data set, and is completely random. For example, data may be repeated multiple times in the second training data set and only once in the first training data set. For another example, some data exists in the first training data set but is deleted in the second training data set. In this way, even if the first training data set originally acquired has the poisoning data, the potential poisoning data in the second data set can be diluted after the segmentation and reconstruction, so that the weight influence of the potential poisoning data can be reduced.

According to an embodiment of the present disclosure, the first training data set may be divided into N mutually exclusive first sub-data sets in operation S210, and then one of the first sub-data sets is randomly sampled with a substitution of S to obtain S second sub-data sets, so that n×s second sub-data sets are obtained corresponding to the N first sub-data sets. Wherein N and S are integers greater than or equal to 2. Thus, the data volume of the training data can be randomly expanded through multiple rounds of random sampling with the release, and the potential poisoning data can be diluted through the control of the data volume.

In one embodiment, the weights of the N mutually exclusive first sub-data sets are the same, e.g., equally dividing the first training data set into N shares by a quantitative size.

In one embodiment, the n×s second sub-data sets may be mixed to obtain the second training data set. In another embodiment, the n×s second sub-data sets may be further processed to obtain a second training data set.

The further processing may be, for example, deleting some second sub-data sets from N x S second sub-data sets according to a certain policy.

For example, from S second sub-data sets sampled from one first sub-data set, L second sub-data sets may be randomly deleted, leaving S-L second sub-data sets. After each first sub-data set is operated in this way, corresponding to N first sub-data sets, N (S-L) second sub-data sets are remained, wherein L is an integer, and 1 is less than or equal to L < S.

Still or further, a portion of the plurality of second sub-data sets having a relatively high similarity may be deleted. For example, among S second sub-data sets sampled from one first sub-data set, another second sub-data set having the highest similarity to each second sub-data set may be regarded as one data set pair, and then one second sub-data set in each data set pair may be randomly deleted.

Thus, by randomly inactivating portions of the training data, the impact of potentially malicious data on the entirety of the data is further reduced.

The MNIST handwritten number recognition data set is exemplified as the first training data set. Suppose an attacker throws a poison into a partial number 0 dataset in order for the machine learning model to misrecognize the number 0 as 1. Through the process of operation S210, the poisoning data is distributed into different first sub-data sets by data division of the poisoning data, and the specific gravity of the poisoning data in the entire training data set is reduced. If the random extraction is performed for a plurality of times, the overall duty ratio of the poisoning data in the data set can be greatly reduced. If part of the data is deleted at random, the influence of the poisoning data on the whole training data set can be reduced again, and the poisoning data after multiple processing has little influence on the final judging result of the training data set.

Then, in operation S220, G individual models are independently trained using the second training data sets, respectively, wherein the machine learning model includes G individual models, wherein algorithms of the G individual models are different from each other, wherein G is an integer greater than or equal to 2. The probability of a model being hacked can be reduced through the integrated training of a plurality of individual models with different algorithms.

Next, in operation S230, the first prediction results output by each of the G individual models are processed according to a predetermined rule in a prediction stage, forming second prediction results output by the machine learning model. The second prediction result is a prediction result that can be provided to the client user.

According to the disclosed embodiment, the G first prediction results may be summarized in a weighted voting manner to obtain an intermediate prediction result, and then a second prediction result may be obtained based on the intermediate prediction result in operation S230.

In one embodiment, the intermediate predictor may be directly used as the second predictor. In another embodiment, the prediction result after adding random noise to the intermediate prediction result is used as the second prediction result.

For example, the G individual models are classification models, and have K classification labels, each individual model outputs probabilities of the K classification labels, and the probabilities of the K classification labels predicted by the G individual models are weighted correspondingly during summarization to obtain probabilities (i.e., intermediate prediction results) of the K classification labels after summarization. And then, taking the probability of the K summarized classified labels as a second prediction result, or adding noise to the probability of the K summarized classified labels to further prevent model theft, so as to obtain the second prediction result and output the second prediction result to a user.

Next, in operation S240, the second prediction result is output to the client.

In order to further reduce the possibility of reverse theft of the server model, according to one embodiment of the present disclosure, a plurality of threshold conditions may be preset in operation S240, and the second prediction result is output to the client when the second prediction result meets a certain threshold condition that is currently randomly selected.

This means that when the user obtains the prediction result transmitted by the server from the client, the prediction result is output after the server has undergone several rounds of prediction, so that privacy protection of the individual model module at the server end can be further enhanced, and model theft is further defended. The setting of the threshold condition provides a further guarantee against model theft attacks.

According to the embodiment of the disclosure, the influence weight of the potential poisoning data on the whole can be reduced by using the data segmentation and reconstruction mode. And the random inactivation part data and the multi-machine learning model are integrated and trained, so that the data poisoning attack can be effectively defended. Meanwhile, the mode of integrated training of the multi-machine learning model can be used for carrying out noise processing on a prediction result and randomly transmitting a prediction label, so that the machine learning model can be prevented from being attacked by reverse stealing, and the defending capability of the machine learning model is obviously improved.

Fig. 3 schematically illustrates a flow chart of a method of defending a machine learning model against attacks according to another embodiment of the present disclosure.

As shown in fig. 3, the method of defending against attack by the machine learning model according to the embodiment may include operations S350 to S370 in addition to operations S210 to S240.

In operation S350, part of the data in the third training data set for training the client model is marked with the second prediction result as a marking basis, so as to obtain a fourth training data set.

In operation S360, semi-supervised training of the client model is performed using the fourth training dataset.

In operation S370, a user task is predicted using the trained client model.

According to the embodiment of the disclosure, when the client is provided with the client model, the client 12 can mark the training data part of the client model according to the prediction result of the model integrated with training in the server 11, and then semi-supervised training learning is performed independently, so that the capability of defending model theft is further enhanced, and the privacy of user data can be ensured not to be leaked.

Fig. 4 schematically illustrates a flow chart of a method of defending a machine learning model against attacks according to another embodiment of the present disclosure.

As shown in fig. 4, the method flow of this embodiment may include steps S41 to S47.

Step S41: and (5) preprocessing data. Noise and inconsistent data in the input data are removed through the modes of eliminating abnormal data, smoothing noise data and the like, and the data are converted into a data set suitable for model training through normalization or standardization.

Step S42: and (5) data segmentation. Dividing the data sets into N data sets with the same size according to a mutual exclusion principle, then sampling each data set back, for example, 20 rounds, finally obtaining 20N data sets, and reducing the influence weight of the poisoning data on the whole data by the obtained data sets. In the process of transferring data to G individual model modules of the server, 10N data sets in the model modules are randomly deactivated, so that the influence of the poisoning data on the model is further reduced.

Step S43: the server integrated model trains learning. And respectively transmitting the 10N data sets to G individual models of the server-side integrated model according to a random sequence, wherein all the individual models solve the same task, but the algorithms and structures adopted between the individual models are different and unknown. The training predictive labels of the individual models are aggregated by weighted voting.

Step S44: and (5) predicting noise processing. Random noise sampled in Laplace or Gaussian distribution is added to the prediction summarization result of the individual model, so that real prediction labels trained by the individual model are prevented from being stolen in a reverse direction.

Step S45: and judging whether the voting result is larger than a set threshold value. In the training process of the client training output module, an output threshold value is set randomly, when the voting result value of the noise processing unit is larger than the set threshold value, a prediction label is transmitted to the client, and otherwise, output is refused.

Step S46: client data markers. The prediction label output by the server side is used for marking part of the prediction data actually required by the client side, the data set can be selected from the data obtained by dividing the data, and the disclosed data set can be used, and the data set is determined by a user.

Step S47: the client semi-supervised model trains learning. Semi-supervised training learning is carried out on the client model, and a prediction result is output to a user.

Fig. 5 schematically illustrates a block diagram of an apparatus 500 for defending a machine learning model against attacks in accordance with an embodiment of the present disclosure.

As shown in fig. 5, the apparatus 500 may include a data segmentation module 1 and a server integration model module 2 according to an embodiment of the present disclosure. According to another embodiment of the present disclosure, the apparatus 500 may further include a client training output module 3. The apparatus 500 may be used to implement the method of defending against attack by a machine learning model described with reference to fig. 2-4.

The data segmentation module 1 is used for carrying out data segmentation and reconstruction on the acquired first training data set for training the machine learning model to obtain a second training data set; wherein the data volume of the second training data set is greater than the data volume of the first training data set, and the data in the second training data set coincides with the random partial data in the first training data set.

The server integrated model module 2 is configured to independently train G individual models using the second training data set, wherein the machine learning model includes G individual models, wherein algorithms of the G individual models are different from each other, wherein G is an integer greater than or equal to 2. The server integrated model module 2 is further configured to process the first prediction results output by each of the G individual models according to a predetermined rule in a prediction stage, form a second prediction result output by the machine learning model, and output the second prediction result to the client.

The client training output module 3 is configured to use the second prediction result as a marking basis to mark a portion of data in the third training data set for training the client model, so as to obtain a fourth training data set. Then, the client training output module 3 is further configured to perform semi-supervised training on the client model using the fourth training data set, and predict the user task using the trained client model.

Fig. 6 schematically illustrates a logical block diagram of the data splitting module according to an embodiment of the present disclosure.

As shown in fig. 6, according to an embodiment of the present disclosure, the data dividing module 1 may include a data preprocessing unit 101, a preprocessing data dividing unit 102, and a divided data transfer unit 103. The data segmentation module 1 is responsible for segmenting the processed data, reducing the weight influence of potential poisoning data and sending the data to the server integrated model module 2.

The data preprocessing unit 101 may receive the first training data set and preprocess data therein. The missing data is filled, the abnormal data is eliminated, the noise data is smoothed, the inconsistent data is corrected, the noise, the filling null value, the lost value and the inconsistent data are removed, and the data are converted into a form suitable for machine learning model training through normalization or standardization. Wherein if the machine learning model is a query model, the first training data set is set to private data, the data set is provided by the service provider without open use to the user of the client. The first training data set may be user-provided data if a model of other application functions.

The preprocessing data dividing unit 102 is firstly responsible for dividing the preprocessed first training data set into N mutually exclusive data sets with the same weight, for example, dividing the data in the first training data set into N first sub data sets with the same data size. And then, randomly sampling each first sub-data set in an S round with a replacement way to obtain N.S second sub-data sets.

For example, 20 rounds of random extraction with put back can be performed on each first sub-data set, 20N second sub-data sets are obtained, and data segmentation is completed. Wherein, the random extraction with the back is that a second sub-data set is obtained after each round of extraction is completed. The resulting second sub-data set is then put back into the first sub-data set and the next round of extraction is then performed. Thus, 20N second sub-data sets may be finally obtained.

The preprocessing data segmentation unit 102 can expand the data size of the training data by means of segmentation and random extraction, dilute the specific gravity and influence of the potentially toxic data in the training data set, and further reduce the influence of the potentially toxic data on the prediction result.

The division data transfer unit 103 receives n×s (e.g., 20N) second sub-data sets obtained by data division by the preprocessing data division unit 102. Then, the split data transfer unit 103 may calculate the similarity between every two of the S second sub-data sets from each of the first data sets, and randomly delete one of the two most similar second sub-data sets. For example, when 20 second sub-data sets are obtained from a first sub-data set, 10 of the 20 second sub-data sets may be deleted based on the similarity determination, and finally, the remaining 10N data sets are transferred to the server integrated model module 2.

Fig. 7 schematically illustrates a logical block diagram of the server integrated model module 2 according to an embodiment of the present disclosure.

As shown in fig. 7, the server integrated model module 2 according to this embodiment may include an integrated machine learning unit 201, a predictive label combining unit 202, a predictive noise processing unit 203, and a predictive label output unit 204.

The integrated machine learning unit 201 receives the second training data set from the output of the segmentation data segmentation module 1 and randomly passes the data in the second training data set into G individual models. Different algorithms are applied to different individual models in the G individual models. The algorithm employed by the G individual models is not disclosed to the client user in order to defend the machine model from theft. All individual models solve the same task, but the respective training processes can be performed independently. The diversity of individual model algorithms and user agnostic may provide important guarantees for defending model theft attacks.

The prediction tag combining unit 202 receives the prediction tag results from the integrated machine learning unit 201, and sums the first prediction results of the G individual models by a weighted voting method for each individual model prediction result, thereby forming K prediction tags and respective probability distributions.

The prediction noise processing unit 203 receives the summarized voting result including K prediction tags output from the prediction tag combining unit 202. The probability distribution condition of K prediction labels in the prediction results can be changed by adding noise, so that a second prediction result is obtained, and leakage of privacy information is further prevented.

Specifically, the prediction noise processing unit 203 can protect the privacy of the prediction result by adding random noise sampled in laplacian or gaussian distribution to the summary voting result. The probability difference of K predictive labels in the summarized voting result is obvious, and the predictive conclusion is not changed by timely adding noise. The probability of the K predictive labels is the same, the probability distribution of the K predictive labels can be changed by adding random noise, and the predictive conclusion is changed correspondingly.

Taking the task of patient disease diagnosis as an example, assuming that 8 models of 10 individual models are used for judging the cold of a patient, 2 models are used for judging the health of the patient, after noise treatment is added, 6 models in output are used for judging the cold of the patient, 4 models are used for judging the health, and the voting result is the same as that before noise is added. If the 5 models judge the cold and the 5 judges the health, one output such as the output of the health is selected randomly, 4 judges the cold after noise is added, and 6 judges the voting result of the health is also the health, and the result is still not influenced after noise treatment. However, after noise processing, the real training prediction result of the model is hidden, and even if the individual prediction is stolen, the model is a false result after noise processing, but the overall voting output is still unaffected. The prediction noise processing of the individual model provides important guarantee for defending model stealing attacks.

The second prediction result obtained after adding the noise is obtained by carrying out weighted voting on the first prediction results respectively output by the G individual models and adding the noise, so that the probability that an attacker attacks the models reversely through the model prediction results and acquires privacy information can be reduced to a large extent, and the model stealing attack is further defended.

The prediction tag output unit 204 may receive the second prediction result delivered by the prediction noise processing unit 203. Some threshold conditions may be set in one of the predicted tag output and output units 204, for example, when the probability or the duty ratio of a certain tag in the data transferred from the predicted noise processing unit 203 is greater than a set threshold value, the transfer of the second predicted result to the client is allowed; and refusing to transmit the second prediction result to the client when the probability or the duty ratio of the label is smaller than the set threshold value. The setting and use of the threshold condition may be random. This means that when the user obtains the prediction result transmitted by the server from the client, the prediction result is output after the server has undergone several rounds of prediction, so that privacy protection of the individual model module at the server end can be further enhanced, and model theft is further defended. The setting of the threshold condition provides a further guarantee against model theft attacks.

The server integrated model module 2 according to this embodiment may train the G individual models by using the training data output by the data segmentation module 1, obtain a prediction tag by summarizing the output of the G individual models, and transmit the prediction tag to the client training output module 3 after noise processing.

According to the embodiment of the disclosure, the server integrated model module 2 can only transmit the prediction label to the client and prohibit the user from directly accessing the server, so that the model steal attack can be further defended and the privacy of the user can be protected.

Fig. 8 schematically illustrates a logical block diagram of the client training output module according to an embodiment of the present disclosure.

As shown in fig. 8, the client training output module 3 according to this embodiment may include a data marking unit 301, a semi-supervised model training unit 302, and a prediction output unit 303.

When the client model is set in the client, the client training output module 3 may be responsible for completing a training prediction task of the client model and outputting a model training result. The client training output module 3 may be responsible for accomplishing various query prediction functional requirements of the user.

The data tagging unit 301 receives the predictive tags of the server integrated model module 2 for part of the data in the third training data set for training the client model and tags these data according to the predictive tags, while the rest of the data in the third training data set may not be tagged. These labeled and unlabeled data are then sent to the semi-supervised model training unit 302 simultaneously as training data (i.e., a fourth training data set).

When the client model provides a query service (e.g., querying health status, etc.), the data in the second training data set output by the data segmentation module 1 may not be disclosed to the user. The training data used in the client training output module 3 can be provided by the user, so that the privacy of the data used by the server side can be protected. When the client model provides other types of tasks such as: classification, clustering, etc., the training data of the client model may be from the second training data set output by the data segmentation module 1.

Semi-supervised model training unit 302 semi-supervises the client model using the fourth training data set delivered by data tagging unit 301. The client model may provide a variety of different algorithmic models depending on the particular predictive task. For example, generating an countermeasure network, selecting a deep belief network, and/or other model algorithms based on a particular problem may be optionally used. In the semi-supervised training process, the accuracy of the prediction result can be well ensured based on the label provided by the server integrated model module 2.

The prediction output unit 303 receives the client model trained by the semi-supervised model training unit 302, and provides the client model for the user to use, completes various prediction tasks of the user, and outputs a prediction result to the user.

Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which when executed, may perform the corresponding functions.

For example, any of the data segmentation module 1, the server integrated model module 2, the client training output module 3, the data preprocessing unit 101, the preprocessed data segmentation unit 102, the segmented data transfer unit 103, the integrated machine learning unit 201, the predictive label combining unit 202, the predictive noise processing unit 203, the predictive label output unit 204, the data tagging unit 301, the semi-supervised model training unit 302, and the predictive output unit 303 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the data splitting module 1, the server integrated model module 2, the client training output module 3, the data preprocessing unit 101, the preprocessing data splitting unit 102, the splitting data transfer unit 103, the integrated machine learning unit 201, the predictive label combining unit 202, the predictive noise processing unit 203, the predictive label output unit 204, the data tagging unit 301, the semi-supervised model training unit 302, the predictive output unit 303 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable way of integrating or packaging the circuitry, or any other reasonable way of hardware or firmware, or any one of or a suitable combination of three ways of software, hardware and firmware. Alternatively, at least one of the data segmentation module 1, the server integrated model module 2, the client training output module 3, the data preprocessing unit 101, the preprocessed data segmentation unit 102, the segmented data transfer unit 103, the integrated machine learning unit 201, the predictive label combining unit 202, the predictive noise processing unit 203, the predictive label output unit 204, the data tagging unit 301, the semi-supervised model training unit 302, the predictive output unit 303 may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

Fig. 9 schematically illustrates a block diagram of a computer system 900 adapted to implement a method of defending a machine learning model against attacks in accordance with an embodiment of the present disclosure. The computer system 900 illustrated in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 9, a computer system 900 according to an embodiment of the present disclosure includes a processor 901, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the computer system 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the disclosure, the computer system 900 may also include an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The computer system 900 may also include one or more of the following components connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, the program code for causing an electronic device to implement the image recognition methods provided by the embodiments of the present disclosure when the computer program product is run on the electronic device.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A method of defending a machine learning model against attacks, comprising:

performing data segmentation and reconstruction on the acquired first training data set for training the machine learning model to obtain a second training data set; wherein the data volume of the second training data set is greater than the data volume of the first training data set, and the data in the second training data set coincides with random partial data in the first training data set;

Independently training G individual models using the second training data set, respectively, wherein the machine learning model comprises the G individual models, wherein algorithms of the G individual models are different from each other, wherein G is an integer greater than or equal to 2;

processing the first prediction results output by the G individual models respectively according to a preset rule in a prediction stage to form second prediction results output by the machine learning model; and

outputting the second prediction result to a client;

the data segmentation and reconstruction are performed on the first training data set obtained for training the machine learning model to obtain a second training data set, which comprises the following steps:

dividing the first training data set into N mutually exclusive first sub data sets, wherein N is an integer greater than or equal to 2;

carrying out S rounds of place-back random sampling on one first sub-data set to obtain S second sub-data sets; obtaining n×s second sub-data sets corresponding to N first sub-data sets, where S is an integer greater than or equal to 2;

and obtaining the second training data set based on the N.S second sub-data sets.

2. The method of claim 1, wherein the deriving the second training data set based on the N x S second sub-data sets comprises:

Randomly deleting L second sub-data sets from S second sub-data sets obtained by sampling one first sub-data set, and remaining S-L second sub-data sets; wherein N x (S-L) second sub-data sets remain corresponding to N first sub-data sets; wherein L is an integer, and L is more than or equal to 1 and less than S;

mixing the remaining N (S-L) of the second sub-data sets as the second training data set.

3. The method of claim 2, wherein randomly deleting L of the second sub-data sets from the S second sub-data sets sampled from one of the first sub-data sets comprises:

taking each second sub-data set and the other second sub-data set with highest similarity as a data set pair in S second sub-data sets obtained by sampling one first sub-data set; and

randomly deleting one of the second sub-data sets in each of the data set pairs.

4. The method of claim 1, wherein the weights of the N first sub-data sets are the same.

5. The method according to any one of claims 1 to 4, wherein processing, in the prediction stage, the first prediction results output by each of the G individual models according to a predetermined rule, to form a second prediction result output by the machine learning model, includes:

Summarizing G first prediction results in a weighted voting mode to obtain an intermediate prediction result; and

and obtaining the second prediction result based on the intermediate prediction result.

6. The method of claim 5, wherein the deriving the second prediction result based on the intermediate prediction result comprises:

and taking the prediction result obtained by adding random noise to the intermediate prediction result as the second prediction result.

7. The method of any one of claims 1-4, wherein outputting the second prediction result to a client comprises:

and outputting the second prediction result to the client when the second prediction result meets a threshold condition, wherein the threshold condition is one condition selected randomly from a plurality of preset conditions.

8. The method of any of claims 1-4, wherein the machine learning model further comprises a client model, the client model disposed at the client; wherein after the outputting the second prediction result to the client, the method further comprises:

marking partial data in a third training data set for training a client model by taking the second prediction result as a marking basis to obtain a fourth training data set;

Semi-supervised training the client model using the fourth training dataset; and

and predicting the user task by using the trained client model.

9. An apparatus for defending a machine learning model against attacks, comprising:

the data segmentation module is used for carrying out data segmentation and reconstruction on the acquired first training data set for training the machine learning model to obtain a second training data set; wherein the data volume of the second training data set is greater than the data volume of the first training data set, and the data in the second training data set coincides with random partial data in the first training data set;

the server integrated model module is used for:

Outputting the second prediction result to a client;

the data segmentation module is specifically configured to:

the first training data set is partitioned into N mutually exclusive first sub-data sets, wherein,

n is an integer greater than or equal to 2;

10. The apparatus of claim 9, wherein the apparatus further comprises:

the client training output module is used for marking partial data in a third training data set for training a client model by taking the second prediction result as a marking basis to obtain a fourth training data set; semi-supervised training the client model using the fourth training dataset; and predicting the user task by using the trained client model.

11. A system for defending a machine learning model against attacks, comprising:

one or more memories storing executable instructions; and

One or more processors executing the executable instructions to implement the method of any of claims 1-8.

12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.