WO2021159819A1

WO2021159819A1 - Machine learning model protection method and device

Info

Publication number: WO2021159819A1
Application number: PCT/CN2020/132839
Authority: WO
Inventors: 刘永超; 金跃; 陈勇; 张尧; 滕腾; 欧航
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2020-02-13
Filing date: 2020-11-30
Publication date: 2021-08-19
Also published as: CN113254885A

Abstract

A domain-specific language compiler-based machine learning model protection method and protection device. The protection method (100) comprises: for each of one or more protection policies of the machine learning model, receiving an instruction from a user so as to invoke corresponding functions (110); receiving input parameter values of the functions; and on the basis of one or more functions respectively for the one or more protection policies and the corresponding input parameter values, automatically generating a machine-executable code for protecting the machine learning model (130).

Description

Machine learning model protection method and equipment

Technical field

This manual relates to the field of computer technology, especially to the field of information security.

Background technique

With the advent of the era of intelligent IoT, more and more artificial intelligence algorithms are deployed in the cloud or terminal device applications. And the artificial intelligence algorithms used in businesses such as face-swiping payment, face-swiping login, unmanned supermarkets, and unmanned banks may be attacked, which may bring financial risks. Usually attackers do not know the specific structure of the machine learning model and the data characteristics used for training, so black box attacks are usually used to obtain corresponding output results by trying different inputs, and observing the output results to guess the working mechanism of the model and discover the system Loopholes.

Currently, strategies such as encryption, private model format, computational graph obfuscation, and weighted data obfuscation are commonly used to protect machine learning models. However, there is still a need to provide more reliable protection for machine learning models.

Summary of the invention

It is desirable to provide a machine learning model protection method and device based on a domain-specific language compiler, which can provide more reliable protection for the machine learning model.

According to one aspect, a method for protecting a machine learning model based on a domain-specific language compiler is provided, which includes: for each of one or more protection strategies of the machine learning model, receiving instructions from a user to invoke the corresponding protection strategy. And receive the input parameter value of the function; and based on the one or more functions for the one or more protection strategies and the corresponding input parameter values, automatically generate for the machine learning model Protected machine executable code.

According to another aspect, a machine learning model protection device based on a domain-specific language compiler is provided, including: a receiving unit for each protection strategy of one or more protection strategies for the machine learning model, Receiving a user's instruction to call a corresponding function, and receiving input parameter values of the function; and a code generating unit, which is configured to be based on one or more functions and corresponding inputs for the one or more protection strategies, respectively Parameter value, and automatically generate machine executable code for protecting the machine learning model.

According to still another aspect, a system for generating a machine learning model is provided, including: a machine learning model generation device for generating a machine learning model; and a machine learning based on a domain-specific language compiler according to each embodiment of this specification The model protection device is used to generate machine executable code for protecting the machine learning model.

According to the various embodiments of each aspect of this specification, the domain-specific language (DSL) compiler provides the ability to parameterize each protection strategy of the machine learning model, thus, by setting different input parameters, the automatic Generate different machine executable codes for each protection strategy, so as to achieve specific protection for each machine learning model. Even if an attacker cracks a machine learning model, since the executable code corresponding to the protection strategy for each machine learning model is different, the migration cost of cracking other machine learning models will not be reduced. As a result, more reliable protection of the machine learning model is provided.

Description of the drawings

Figure 1 shows a protection architecture diagram of a machine learning model in one case;

Figure 2 shows a machine learning model protection method based on a domain-specific language compiler according to an embodiment;

Figure 3a shows a predefined function according to an embodiment;

Figure 3b shows function call and fusion according to one embodiment;

Figure 4 shows a system for generating a machine learning model according to one embodiment.

The various aspects and features of this specification are described with reference to the above-mentioned drawings. The same or similar reference numerals are usually used to denote the same parts. The above drawings are only schematic and not restrictive. Without departing from the gist of the specification, the size, shape, label, or appearance of the various elements in the above-mentioned drawings may be changed, and are not limited to only those shown in the drawings of the specification.

Detailed ways

One or more protection strategies can be used to protect the machine learning model, including encryption, computational graph obfuscation, and/or weighted data obfuscation. Preferably, the user can select one or more protection strategies from the group including encryption, calculation graph obfuscation, and weight data obfuscation to form his unique protection logic for the machine learning model. The machine learning model protection device described below can display the currently available protection strategies to the user, and the user selects a specific protection strategy among them to protect the current machine learning model.

Figure 1 shows a protection architecture diagram of a machine learning model in one case. The machine learning model generated by artificial intelligence means, that is, the machine executable program that realizes the machine learning model, can be protected by specific protection logic composed of calculation graph obfuscation, weighted data obfuscation and encryption, and the final output is used for the model Protected machine executable code and custom model format. It can also be expected to choose different combinations of protection strategies to form other protection logic.

Fig. 2 shows a machine learning model protection method 100 based on a DSL compiler according to an embodiment. The protection method 100 can perform the following processing.

At 110, for each protection strategy of one or more protection strategies of the current machine learning model, a user's instruction is received to call a corresponding function. The one or more protection strategies can be specifically selected by the user for the current machine learning model. Especially selected from the group consisting of encryption, computational graph confusion, and weight data confusion. This enables users to specify user-specific protection logic for each machine learning model. These functions are predefined in the DSL compiler for various protection strategies. For example, function A represents encryption, function B represents calculation graph confusion, and function C represents weight data confusion. Thus, referring to the example shown in FIG. 1, the user can input instructions to sequentially call the functions B, C, and A as the protection logic for the machine learning model. Figure 3a shows an example of a predefined function in the DSL compiler.

At 120, the input parameter value of the function for each protection strategy is received. The input parameter value can be set by the user according to his needs. In particular, the input parameter value can be different for different machine learning models. Refer to the example shown in Figure 1 to receive input parameter values for functions B, C, and A respectively.

In one embodiment, the input parameter value for each protection strategy can be randomly generated, and then the randomly generated input parameter value can be received.

Although the receiving function call and the input parameter value of the receiving function are described separately in

processes

110 and 120, it can be understood that they can be executed in the same process. For example, it is preferable to be able to receive the function of a certain protection strategy Receive the corresponding input parameter value while calling. In this case, the user's instruction can include the designation of the input parameter value.

At 130, based on one or more functions for one or more protection strategies and corresponding input parameter values, machine executable codes for protecting the current machine learning model are automatically generated.

In one embodiment, for each protection strategy of one or more protection strategies, machine executable code for each protection strategy can be automatically generated based on the corresponding function and input parameter values. Referring to the example shown in FIG. 1, in this process, machine executable codes that implement corresponding functions (ie, calculation graph obfuscation, weight data obfuscation, encryption) are automatically generated for functions B, C, A and their input parameter values. This constitutes a protection code for the machine learning model. The protection code can be provided to users together with the machine learning model.

In another embodiment, when multiple protection strategies are used for the current machine learning model, multiple functions corresponding to the multiple protection strategies can be selectively fused before the machine executable code is automatically generated, and then based on the fused Function to generate machine executable code, thereby further increasing the difficulty of understanding the code logic.

Specifically, at least two of the multiple functions corresponding to multiple protection strategies can be fused to generate a fused function; then the corresponding machine can be automatically generated based on the fused function and corresponding input parameter values. Executable code. For those functions that are not fused, the corresponding machine executable code can still be automatically generated based on the function and the corresponding input parameter values.

Preferably, multiple functions corresponding to multiple protection strategies are merged to generate a merged function, and then corresponding machine executable codes are automatically generated based on the merged function and corresponding input parameter values.

It can also be expected to group and merge multiple functions corresponding to multiple protection strategies to generate multiple fused functions, and then generate corresponding machine executable codes based on the multiple fused functions and corresponding input parameters.

Figure 3a shows the predefined functions E and F in the DSL compiler according to one embodiment. Figure 3b shows the calling and fusion of functions E and F according to one embodiment. The functions E and F may be functions predefined in the DSL compiler corresponding to different protection strategies. The predefined functions E and F are shown in Figure 3a. According to the general embodiment, the user can input the instructions E_func(x,len) and F_func(x,len) to call the functions E and F and input the corresponding parameter values, so that the DSL compiler can automatically generate the corresponding machine executable Code. In the above-mentioned data fusion embodiment, the DSL compiler can first merge the functions E and F to obtain the fused function shown in FIG. 3b, and then generate machine executable code based on the function.

The various embodiments are described above with reference to a method for protecting a machine learning model based on a DSL compiler. It can be understood that the various processes of the various methods can be split, reorganized, or combined to achieve corresponding functions.

Figure 4 shows a system 10 for generating a machine learning model according to one embodiment. The system 10 includes a machine learning model generation device 11, which is used to generate a machine learning model, and a DSL compiler-based machine learning model protection device 12, which generates machine executable code for protecting the machine learning model according to different protection strategies. The protection device 12 includes a receiving unit 121 and a code generating unit 122. The receiving unit 121 receives an instruction from a user to call a corresponding function for each protection strategy of one or more protection strategies of the current machine learning model, and receives input parameter values of the function. The called function is predefined and can be stored in the memory 13. It is also conceivable that the memory is part of the protection device 12. The code generation unit 122 automatically generates machine executable code for protecting the current machine learning model based on one or more functions for one or more protection strategies and corresponding input parameter values.

In one embodiment, the code generation unit 122 automatically generates machine executable code for the protection strategy based on the corresponding function and input parameter value for each protection strategy in one or more protection strategies.

In another embodiment, the code generation unit 122 fuses at least two of the multiple functions corresponding to the multiple protection strategies to generate a fused function; then based on the fused function and the corresponding input The parameter value automatically generates the corresponding machine executable code.

In another embodiment, the system may further include a random number generating unit (not shown) configured to randomly generate an input parameter value for each protection strategy, and the receiving unit 121 receives the randomly generated input parameter value. The random number generating unit can be expected to be a part of the protection device 12.

It is also expected that the code generation unit 122 performs the function fusion described above and various processes related to code generation corresponding to the fusion function. It can be expected to add various functional units or modules of the protection device of this specification on the basis of the existing DSL compiler. The above-mentioned receiving unit 121 and code generating unit 122 are implemented as DSL compiler modules by a DSL compiler.

Although the machine learning model protection device 12 based on the DSL compiler is described above in the system 10 for generating the machine learning model, it is conceivable to use the protection device 12 based on the DSL compiler to the machine learning model as a separate device.

It is conceivable that the receiving unit of the protection device 12 can also receive the user's selection of the protection strategy. In one embodiment, the protection device 12 can include a display unit, which can display the currently selectable protection strategy and the corresponding instruction to the user, and the user can input the instruction based on his own selection of the protection strategy to call the corresponding function. Further, the display unit can also prompt the user to input the corresponding parameter value for the specific protection strategy selected by the user.

It can be understood that the methods and devices of the various embodiments of this specification can be implemented by computer programs/software. These software includes computer program instructions, which can be loaded into the working memory of the data processor, and used to execute the methods according to the embodiments of the present specification when running.

The exemplary embodiments of this specification cover both of the following: creating/using the computer program/software of this specification from the beginning, and converting an existing program/software into a computer program/software using this specification by means of an update.

According to another embodiment of the present specification, a machine (such as a computer) readable medium, such as a CD-ROM, is provided, wherein the readable medium has computer program code stored thereon, and the computer program code when executed The computer or the processor executes the method according to the embodiments of this specification. The machine-readable medium is, for example, an optical storage medium or a solid-state medium supplied with or as part of other hardware.

The computer program for executing the method according to the various embodiments of the present specification may also be distributed in other forms, for example, via the Internet or other wired or wireless telecommunication systems.

The computer program can also be provided on a network such as the World Wide Web and can be downloaded from such a network to the working computer of the data processor.

It can also be understood that the flow of each unit and method in the system of each embodiment of this specification can also be implemented by hardware or a combination of hardware and software.

In one embodiment, the system according to this specification can be implemented by a memory and a processor. The memory can store computer program codes for running the method procedures according to the various embodiments of this specification; when running the program codes from the memory, the processor executes the procedures according to the various embodiments of this specification.

The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

It must be pointed out that the embodiments of this specification are described with reference to different subjects. In particular, some embodiments are described with reference to method-type claims, while other embodiments are described with reference to device-type claims. However, those skilled in the art will learn from the above and the following description, unless otherwise specified, in addition to any combination of features belonging to one type of subject, any combination of features related to different subjects is also deemed to be disclosed in this specification. NS. In addition, all the features can be combined to provide a synergistic effect that is greater than the simple addition of the features.

The specification is described above with reference to specific embodiments, and those skilled in the art should understand that the technical solutions of the specification can be implemented in various ways without departing from the spirit and basic characteristics of the specification. The specific embodiments are merely illustrative and not restrictive. In addition, these embodiments can be arbitrarily combined to achieve the purpose of this specification. The protection scope of this specification is defined by the appended claims.

The word "comprising" in the description and claims does not exclude the presence of other elements or steps. The function of each element described in the specification or described in the claims can also be divided or combined, and implemented by a plurality of corresponding elements or a single element.

Claims

A machine learning model protection method based on a domain-specific language compiler, including

For each of the one or more protection strategies of the machine learning model, receive a user's instruction to call the corresponding function, and receive the input parameter value of the function; and

Based on one or more functions for the one or more protection strategies and corresponding input parameter values, the machine executable code for protecting the machine learning model is automatically generated.
The machine learning model protection method according to claim 1, wherein automatically generating machine executable code for protecting the machine learning model comprises

For each of the one or more protection strategies, the machine executable code for the protection strategy is automatically generated based on the corresponding function and input parameter value.
The machine learning model protection method according to claim 1, wherein automatically generating machine executable code for protecting the machine learning model comprises

Fusion of at least two of the multiple functions corresponding to multiple protection strategies to generate a fused function;

The corresponding machine executable code is automatically generated based on the fused function and the corresponding input parameter value.
The machine learning model protection method according to any one of claims 1-3, further comprising

Randomly generating the input parameter value for each protection strategy;

Wherein, receiving the input parameter value of the function includes

Receiving the input parameter value randomly generated for the protection strategy.
The machine learning model protection method according to any one of claims 1-3, wherein:

The one or more protection strategies are selected by the user.
The machine learning model protection method according to claim 5, wherein the one or more protection strategies are selected from the group consisting of encryption, computational graph obfuscation, or weighted data obfuscation.
A machine learning model protection device based on a domain-specific language compiler, including

A receiving unit, which is configured to receive a user's instruction to call a corresponding function for each of the one or more protection strategies of the machine learning model, and to receive input parameter values of the function; and

The code generation unit is configured to automatically generate machine executable code for protecting the machine learning model based on one or more functions and corresponding input parameter values for the one or more protection strategies.
The machine learning model protection device according to claim 7, wherein the code generation unit is also used for

For each of the one or more protection strategies, the machine executable code for the protection strategy is automatically generated based on the corresponding function and input parameter value.
The machine learning model protection device according to claim 7, wherein the code generation unit is also used for

Fusion of at least two of the multiple functions corresponding to multiple protection strategies to generate a fused function;

The corresponding machine executable code is automatically generated based on the fused function and the corresponding input parameter value.
The machine learning model protection device according to any one of claims 7-9, further comprising

A random number generating unit, which is used to randomly generate the input parameter value for each protection strategy;

Wherein, the receiving unit receives the input parameter value randomly generated for the protection strategy from the random number generating unit.
The machine learning model protection device according to any one of claims 7-9, wherein:

The one or more protection strategies are selected by the user.
The machine learning model protection device according to claim 11, wherein the one or more protection strategies are selected from the group consisting of encryption, computational graph obfuscation, or weighted data obfuscation.
A system for generating machine learning models, including

Machine learning model generation equipment, which is used to generate machine learning models; and

The machine learning model protection device based on a domain-specific language compiler according to any one of claims 7-12, which is used to generate machine executable code for protecting the machine learning model.