CN114329357A - Method and device for protecting code security - Google Patents

Method and device for protecting code security Download PDF

Info

Publication number
CN114329357A
CN114329357A CN202111603927.0A CN202111603927A CN114329357A CN 114329357 A CN114329357 A CN 114329357A CN 202111603927 A CN202111603927 A CN 202111603927A CN 114329357 A CN114329357 A CN 114329357A
Authority
CN
China
Prior art keywords
code
data
file
user
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111603927.0A
Other languages
Chinese (zh)
Inventor
于子淇
林立翔
游亮
龙欣
张尉东
卓钧亮
戚余航
刘思超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111603927.0A priority Critical patent/CN114329357A/en
Publication of CN114329357A publication Critical patent/CN114329357A/en
Pending legal-status Critical Current

Links

Images

Abstract

An embodiment of the present specification provides a method for protecting code security, including: the server side responds to a request sent by a first user based on a client side of the first user, and obtains an original code corresponding to the request; the server side carries out encryption processing based on the original code by using the identification information of the first user to obtain encrypted data, and sends the encrypted data to the client side; the client calls an interpreter to decrypt the encrypted data by using the identification information of the first user to obtain decrypted data; and the client determines a code file corresponding to the original code based on the decrypted data, and calls the interpreter to execute the code file.

Description

Method and device for protecting code security
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for protecting code security.
Background
The development and writing of the code often take a lot of time, but the reproduction and migration of the code are easy to complete. Thus, if the developed original code is delivered directly to the customer, the customer can easily migrate it for use, for example, to other systems or machines, which can compromise the virtuous cycle between code development and purchase for use.
However, at present, the protection mode of the code is limited, and the higher requirement in practical application is difficult to meet. Therefore, there is a strong need for a solution that allows safe and easy delivery of code products to customers.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and an apparatus for protecting code security, in which a customized interpreter is designed, and only the customized interpreter can decrypt and use a code product, so that a malicious user cannot migrate and use the code product.
According to a first aspect, a method of securing code is provided. The method comprises the following steps: the server side responds to a request sent by a first user based on a client side of the first user, and obtains an original code corresponding to the request; the server side carries out encryption processing based on the original code by using the identification information of the first user to obtain encrypted data, and sends the encrypted data to the client side; the client calls an interpreter to decrypt the encrypted data by using the identification information of the first user to obtain decrypted data; and the client determines a code file corresponding to the original code based on the decrypted data, and calls the interpreter to execute the code file.
In one embodiment, the server performs encryption processing based on the original code by using the identification information of the first user, including: the server side obtains identification information of the first user and generates a first key according to the identification information; performing the encryption processing using the first key; the client calls an interpreter to decrypt the encrypted data by using the identification information of the first user, and the decryption process comprises the following steps: the interpreter acquires the identification information and generates a second key according to the identification information; and performing the decryption processing by using the second key.
In a specific embodiment, the obtaining, by the interpreter, the identification information includes: and the interpreter acquires the identification information under the condition that the first user passes identity authentication.
In one embodiment, the encrypting, by the server, based on the original code by using the identification information of the first user to obtain encrypted data includes: inserting an encapsulation code into the original code, and encapsulating the original code into a function to obtain a function implementation code; compiling the function implementation code to obtain a compiled file; encrypting the packaging code by using the identification information to obtain the encrypted data; before the client determines a code file corresponding to the original code based on the decryption data, the method further comprises: the client receives the compiled file from the server; the client determines a code file corresponding to the original code based on the decryption data, including: the client calls the function based on the package code, thereby determining the compiled file corresponding to the function as the code file.
In a specific embodiment, the method further comprises: the client displays the packaging code to the first user so that the first user can develop an algorithm based on the packaging code.
In a specific embodiment, inserting an encapsulation code into the original code, encapsulating the original code into a function, and obtaining a function implementation code, includes: and under the condition that the original code is judged not to belong to the pre-marked complete code, packaging.
In a specific embodiment, performing compilation processing based on the function implementation code to obtain a compiled file includes: performing code obfuscation processing on the function implementation code to obtain an obfuscated file; and compiling the obfuscated file to obtain the compiled file.
In one embodiment, the encrypting, by the server, based on the original code by using the identification information of the first user to obtain encrypted data includes: constructing a syntax tree corresponding to the original code; performing the encryption processing based on the syntax tree to obtain encrypted data; the client determines a code file corresponding to the original code based on the decryption data, including: the syntax tree is determined based on the decryption data and rendered into executable code as the code file.
In a specific embodiment, the encrypting process based on the syntax tree includes: randomizing the syntax tree by using a random seed to obtain randomized data, and storing a mapping relation between the random seed and the randomized data; performing the encryption processing based on the randomized data; determining the syntax tree based on the decrypted data, comprising: determining the randomized data based on the decrypted data; acquiring the corresponding random seed from the server based on the randomized data; and performing anti-randomization treatment on the randomized data by using the random seeds to obtain the syntax tree.
In a more specific embodiment, the determining of the random seed includes: and the server generates the random seed according to the identification information and the current timestamp.
In another more specific embodiment, performing the encryption process based on the randomized data includes: carrying out serialization processing on the randomized data to obtain sequence data; carrying out compression coding processing on the sequence data to obtain coded data; performing the encryption processing on the encoded data; determining the randomized data based on the decrypted data, comprising: decoding the decrypted data to obtain the serialized data; and performing deserialization processing on the serialized data to obtain the randomized data.
In another aspect, in a specific embodiment, constructing a syntax tree corresponding to the original code includes: and under the condition that the original code is judged to belong to the pre-marked complete code, constructing the syntax tree.
In one embodiment, before the client determines the code file corresponding to the original code based on the decryption data, the method further comprises: the client acquires hardware information of a hardware platform where the client is located by executing a hardware permission detection operator received from the server, and judges whether the hardware information is legal or not; the client determines a code file corresponding to the original code based on the decryption data, including: and determining the code file under the condition that the hardware information is judged to be legal.
In one embodiment, the original code is written based on the python language; calling the interpreter to execute the code file, wherein the calling the interpreter to execute the code file comprises the following steps: the interpreter generates an intermediate file in a pyc format corresponding to the code file; and the interpreter removes the intermediate file after the execution of the interpretation of the intermediate file is finished.
According to a second aspect, a system for securing code is provided. The system comprises: the server is used for responding to a request sent by a first user based on a client side of the first user and acquiring an original code corresponding to the request; the server is further configured to: encrypting based on the original code by using the identification information of the first user to obtain encrypted data, and sending the encrypted data to the client; the client is used for calling an interpreter to decrypt the encrypted data by using the identification information of the first user to obtain decrypted data; the client is further configured to: and determining a code file corresponding to the original code based on the decrypted data, and calling the interpreter to execute the code file.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor which, when executing the executable code, implements the method of the first aspect.
By adopting the method and the device provided by the embodiment of the specification, the decryption and the use of the code product delivered to the user can be realized only under the condition that a legal user uses the customized interpreter, so that an illegal user cannot decrypt the code product, and the difficulty of the reverse code product is upgraded to the reverse customized interpreter, thereby effectively protecting the code safety and preventing the code from being illegally migrated and used.
Further, protection may also be performed from the full lifecycle of code execution, including: the method comprises the steps of a code layer (such as packaging, obfuscating, randomizing, serializing, compressing and encrypting a source code), a compiler layer (such as compiling processing), an interpreter layer (including a customized interpreter), runtime (such as hardware permission detection), a scheduling layer (such as requiring a user to transmit account verification information when an authorized code is used), and a physical layer (such as collecting hardware information of current hardware), so that a malicious user must attack each layer to have reverse success, difficulty is exponential pain points, code safety is fully and effectively guaranteed, and algorithm core logic is not leaked when a code product is output through a line.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 illustrates a diagram of method steps for securing code disclosed by embodiments of the present specification;
FIG. 2 illustrates an implementation flow diagram for determining encrypted data, according to one embodiment;
FIG. 3 is a schematic flow chart illustrating an implementation of determining encrypted data according to another embodiment;
FIG. 4 is a schematic flow chart illustrating an implementation of determining encrypted data according to yet another embodiment;
FIG. 5 illustrates a code protection scheme structure diagram for multi-layer encryption according to one embodiment;
fig. 6 shows a schematic structural diagram of a system for protecting code security disclosed in an embodiment of the present specification.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As mentioned earlier, code writing tends to require a significant amount of time. Especially in the field of artificial intelligence, the innovative development of deep learning model algorithms usually needs to consume a large amount of resources such as manpower and time, and the model algorithm implementation is mainly based on python language, which is a dynamic interpretation type language, and the running codes are all in a plaintext mode, so that the safety protection for python codes is particularly urgent. However, the code protection is performed by the existing method, the algorithm logic is easy to be analyzed reversely or statically, and the higher security protection requirement is difficult to meet.
Based on this, the inventor proposes a method for protecting code security, which can realize decryption of code products only when legal users use the customized interpreter, so that illegal users cannot decrypt the code products, and the difficulty of reverse code products is upgraded to the reverse customized interpreter, thereby effectively protecting code security and preventing code from being illegally migrated and used.
Fig. 1 shows a method step diagram for protecting code security disclosed in the embodiment of the present specification, the method comprising the following steps:
step S110, the server responds to a request sent by a first user based on a client side of the first user, and obtains an original code corresponding to the request; step S120, the server side carries out encryption processing based on the original code by utilizing the identification information of the first user to obtain encrypted data; step S130, the server side sends the encrypted data to the client side; step S140, the client calls an interpreter to decrypt the encrypted data by using the identification information of the first user to obtain decrypted data; step S150, the client determines a code file corresponding to the original code based on the decrypted data; step S160, the client calls the interpreter to execute the code file.
The development of the above steps is as follows:
first, in step S110, the server, in response to a request issued by the first user based on its client, acquires an original code corresponding to the request. It should be understood that the first user may refer to any user, including an individual user or an enterprise user, and the client of the first user logs in its registered account; in addition, the original code may be source code written in any programming language, such as python language, C/C + + language, or plain code.
In one embodiment, the client displays description information, such as function information, price, and the like of each code, provided by the server to the first user, so as to receive a selection instruction of the first user for a certain code, generate a code use request, or simply a request, based on the selection instruction, and send the request to the server. In one embodiment, the request includes a code identifier, and accordingly, the server may obtain the corresponding original code based on the code identifier.
Thus, the server can obtain the original code, so that in step S120, the encrypted data is obtained by performing encryption processing based on the original code by using the identification information of the first user. It should be understood that the identification information is used to uniquely identify the identity of the first user, and specifically may be a registered account number, an identity card number, a mobile phone number of the first user, a character string identifier allocated by the server to the first user, and the like.
In this step, the data to be encrypted may be determined based on the original code, and then the data to be encrypted may be encrypted by using the identification information, so as to obtain encrypted data, or encrypted data for short.
For the determination of the data to be encrypted, in embodiment a, as shown in fig. 2, an encapsulation code is inserted into the original code, and the original code is encapsulated into a function, so as to obtain a function implementation code. At this time, the inserted package code corresponds to the data to be encrypted. In one embodiment, the generation and insertion of the package code can be automatically realized by a wrapper modifier; in another embodiment, the insertion may be done manually after writing by a worker.
On the other hand, in embodiment a, as shown in fig. 2, the compiling process is also performed based on the function implementation code to obtain a compiled file.
In an embodiment, the function implementation code may be directly compiled to obtain the compiled file. It is understood that the compilation may be performed using a compiler corresponding to the programming language of the original code, and for example, for python code, the compilation may be performed using a cython compiler, thereby obtaining a compiled file in the form of so or pyd, which is a byte code file or binary file. Therefore, a compiled file obtained by compiling the original code is difficult to crack, and meanwhile, the subsequent execution performance can be effectively improved.
In another embodiment, the function implementation code may be subjected to code obfuscation to obtain an obfuscated file, and then the obfuscated file is compiled to obtain the compiled file. It should be understood that code obfuscation refers to converting the source code of a computer program into a functionally equivalent, but difficult to read and understand, form. The implementation idea of code obfuscation includes removing comments and documents, changing indentation, renaming functions, classes, changes, and inserting invalid code in blank lines. Illustratively, code obfuscation may be implemented using an obfuscator or like library. Therefore, the code is compiled after being mixed up, the difficulty of decompilation can be increased, and the real semantics of the program is difficult to analyze even if the decompilation is successful.
Thus, in the embodiment a, the compiled file and the data to be encrypted, that is, the package code, can be obtained.
In embodiment B, as shown in fig. 3, a syntax tree (or abstract syntax tree, AST for short) corresponding to the original code is first constructed, and the data to be encrypted is determined based on the syntax tree. It should be understood that the construction of the AST syntax tree generally includes lexical analysis and syntax analysis, so as to obtain data of the tree structure, which may be implemented in an existing manner and is not described in detail.
In one embodiment, the AST syntax tree may be directly used as the data to be encrypted.
In another embodiment, a random seed may be utilized to randomize the AST syntax tree to obtain randomized data, and the data to be encrypted is determined based on the randomized data. Meanwhile, the server can also store the mapping relation between the random seed and the randomized data. In a specific embodiment, the server may generate the random seed according to the identifier of the first user and the current timestamp. In one example, a hash value obtained by hashing the identification information and the current timestamp may be used as the random seed.
Further, in a specific embodiment, the randomized data can be directly determined as the data to be encrypted. In another specific embodiment, the randomized data may be serialized to obtain serial data, and the serial data may be further subjected to compression encoding to obtain encoded data as the data to be encrypted. It is understood that Serialization (Serialization) is the process of converting state information of an object into a form that can be stored or transmitted; illustratively, the above-mentioned serialization process can be implemented by using pickle, json or shelve libraries. The compression encoding process may be implemented by base64 encoding or base32 encoding.
It should be noted that, the above-mentioned randomization process, serialization process and compression coding process are also used to perform encryption protection on the AST syntax tree, and the more the encryption protection level is, the higher the difficulty of malicious users to reversely push out the AST syntax tree becomes, and the increase is exponential.
As described above, in embodiment B, data to be encrypted, specifically, the AST syntax tree, the randomized data, or the encoded data can be obtained.
While the exemplary embodiments a and B for determining data to be encrypted are described above, it should be understood that other embodiments may be actually adopted, for example, the package code and the compiled file are determined as data to be encrypted together, or the serialized data is determined as data to be encrypted, or the original code is determined as data to be encrypted.
In addition, this step may be performed by using any of the above embodiments, or a corresponding embodiment may be selected for the completeness of the original code. In one embodiment, the server may perform classification marking on multiple original codes developed by the server, specifically, mark codes with complete functionality and without additional development by a user as complete codes, and mark other codes as non-complete codes. Based on this, in a specific embodiment, as shown in fig. 4, when the server determines that the original code obtained according to the user request does not belong to a complete code, or belongs to an incomplete code, the server processes the original code by using the above embodiment a to obtain corresponding data to be encrypted, and at this time, the server may also deliver the encapsulated code generated in the embodiment a to the first user, so that the first user may perform additional algorithm development based on the encapsulated code. It is to be understood that the encapsulated code is similar to a functional interface, which does not itself expose the algorithmic logic of the original code. In another specific embodiment, the server, under the condition that it determines that the obtained original code belongs to a complete code, processes the original code by using the above embodiment B to obtain corresponding data to be encrypted.
Further, for the determined data to be encrypted, the data to be encrypted may be encrypted by using the identification information of the first user, so as to obtain the encrypted data, which may be referred to fig. 2, fig. 3, or fig. 4.
In one embodiment, the server obtains the identification information of the first user, generates a first key according to the identification information, and then performs encryption processing by using the first key. In a specific embodiment, the generation and use of the first key is based on a symmetric Encryption algorithm, for example, an Advanced Encryption Standard (AES) algorithm, or a Data Encryption Standard (DES) algorithm, etc. In another specific embodiment, the first key may be generated and used using an asymmetric encryption algorithm, such as an RSA encryption algorithm, or an Elgamal algorithm, among others. It is to be understood that the asymmetric cryptographic algorithm involves a public key and a private key, and the first key may be either the public key or the private key thereof.
In another embodiment, the server performs hash processing on the identification information to obtain a corresponding hash value; and then the hash value is used as a data coordinate to encrypt the data to be encrypted. In a specific embodiment, a preset character, for example, 0 or 1, is inserted after the character of the corresponding position in the data to be encrypted. In another specific embodiment, the preset mask is used to perform a masking process on the characters at the corresponding positions in the data to be encrypted, for example, adding 1 or adding 2.
Therefore, the server side encrypts the data to be encrypted by using the identification information of the first user, so that the encrypted data can be obtained. Thereafter, the server may send the encrypted data to the client in step S130.
In one embodiment, the server may further send the compiled file to the client.
In one embodiment, the server may further send a hardware-permission-detector (license-operator) to the client. It is understood that the hardware permission detection operator is essentially a piece of code, and this operator can perform hardware verification at the runtime (runtime) level, ensuring that the code runs on a licensed machine.
Therefore, the client can receive the code product such as encrypted data from the server, so as to perform data processing such as decryption and obtain an executable code file. Specifically, in step S140, the client calls the interpreter to decrypt the encrypted data using the identification information of the first user, so as to obtain decrypted data. It should be noted that the interpreter disclosed in the embodiment of the present specification is deployed in the client, and is different from the conventional interpreter which can only run plaintext codes, but is customized and designed to additionally have a decryption function. Thus, even if a user attempts to migrate encrypted data to another system, the migration fails because the conventional interpreter cannot directly interpret the encrypted data, and the customized interpreter does not affect the compatibility of the versions. Furthermore, the identification information of the first user is required for the decryption process of the code product by the interpreter, so that the user of the code product can be restricted to a specific legitimate user.
In one embodiment, the interpreter may perform the decryption process described above directly. In another embodiment, the interpreter may trigger authentication of the first user, and in case the first user passes the authentication, the decryption process is performed. It should be understood that, if the first user has performed identity authentication when performing account login, the identity authentication triggered in this step may be secondary identity authentication. Thus, the reliability of the identity authentication result can be enhanced.
In one embodiment, the interpreter may retrieve locally stored identification information of the first user. In another embodiment, the interpreter may trigger the client to obtain the identification information of the first user from the server based on the encrypted data.
It should be understood that the decryption process in this step is a reverse process of the encryption process, and accordingly, the decrypted data successfully decrypted in this step is consistent with the data to be encrypted. Thus, in one embodiment, the interpreter generates a second key based on the identification information, thereby performing decryption processing using the second key. Further, in a specific embodiment, the generation and use of the second key is based on the above-mentioned symmetric encryption algorithm, and in this case, the second key is the same key as the above-mentioned first key. In another specific embodiment, the generation and use of the second key are based on the above asymmetric encryption algorithm, in which case, the second key and the first key are respectively a public key or a private key, or respectively a private key and a public key. In this manner, decryption processing using the second key can be realized.
In another embodiment, the interpreter performs hash processing on the identification information to obtain a corresponding hash value; and then the hash value is used as a data coordinate to decrypt the encrypted data. In a specific embodiment, the positions of preset characters inserted into the encrypted data during encryption processing are derived according to the data coordinates, and then the preset characters at the positions are removed, so as to obtain the decrypted data. In another specific embodiment, the preset mask is used to perform a de-masking process on the characters at the corresponding positions in the encrypted data, for example, subtracting 1 or subtracting 2.
Therefore, the client can call the customized interpreter to implement decryption processing on the decrypted data to obtain decrypted data, where the decrypted data corresponds to the data to be encrypted, and for this, refer to fig. 2 or fig. 3. Then, in step S150, the client determines a code file corresponding to the original code based on the decrypted data.
In one embodiment, prior to this step, the method may further comprise: and the client acquires the hardware information of the hardware platform where the client is located by executing the hardware permission detection operator received from the server, and judges whether the hardware information is legal or not. Accordingly, the step may include: and determining the code file under the condition that the hardware information is judged to be legal.
Further, in a specific embodiment, the hardware platform may be local hardware of the terminal where the client is located, or may be remote machine hardware, such as a machine cluster in a cloud resource pool.
In a specific embodiment, after the client collects the hardware information of the hardware platform, the hardware information is compared with the registered hardware information, and if the hardware information is matched with the registered hardware information, the client judges that the hardware information is legal, otherwise, the hardware information is illegal. Further, in one example, the client may obtain the registered hardware information from the server; in another example, the hardware permission detection operator includes registered hardware information written by the server. On the other hand, in an example, the registered hardware information may include hardware information of hardware that is previously declared by the first user and is audited by the server, or may further include hardware information of a hardware instance that the first user has purchased in the cloud resource pool. In one example, the hardware information may include hardware calibration information such as a hardware number, a Media Access Control Address (MAC), a root of trust, and the like.
Therefore, by executing the hardware detection permission detection, the code product can be ensured to be used on the legal hardware, so that a malicious user is prevented from illegally migrating the code product to other machines for use.
For the description of this step, since the decrypted data obtained by the decryption process corresponds to the data to be encrypted determined based on the original code, correspondingly, the manner of determining the code file based on the decrypted data in this step is associated with the determination manner of the data to be encrypted. For ease of understanding, the following description will mainly deal with the manner of determining the code file in this step, taking as an example that data to be encrypted is determined based on the above-described embodiment a and embodiment B.
On the basis of the assumption that the decrypted data corresponds to the data to be encrypted determined based on the above embodiment a and, accordingly, the above package code is included in the above decrypted data, the implementation of this step may be performed in the manner shown in fig. 2: based on the encapsulated code, the function indicated in the encapsulated code, that is, the function obtained by encapsulating the original code, is called, so that the compiled file corresponding to the function is determined as the code file.
Assuming that the decrypted data corresponds to the data to be encrypted determined based on the above-described embodiment B, accordingly, the implementation of this step may employ the manner shown in fig. 3: the AST syntax tree is determined based on decryption data, and the AST syntax tree is rendered into executable codes as the code file. For the determination of the AST syntax tree, in one embodiment, the decryption data obtained by the decryption process includes the AST syntax tree. In another embodiment, the decrypted data includes the randomized data, and accordingly, based on the randomized data, a random seed having a mapping relationship with the decrypted data may be obtained from the server, so that based on the obtained random seed, the randomized data is subjected to an anti-randomization process, and an AST syntax tree is obtained. In another embodiment, the decrypted data includes the encoded data, and accordingly, decoding processing may be performed based on the encoded data to obtain the serialized data, and deserializing processing may be performed based on the serialized data to obtain the randomized data; and acquiring a random seed from the server based on the randomized data, thereby performing anti-randomization to obtain the AST syntax tree.
On the other hand, when the server determines the data to be encrypted, the server selects a corresponding implementation mode according to the completeness of the original code, and accordingly, in this step, the client may first determine whether the decrypted data corresponds to the complete code, and if the decrypted data does not correspond to the complete code, the code file is determined in the manner shown in fig. 2, otherwise, the code file is determined in the manner shown in fig. 3.
In this regard, the client may determine a code file based on the decryption data, including the compiled file or executable code rendered based on the AST syntax tree as described above. Thereafter, the client calls the interpreter to execute the determined code file at step S160. Specifically, the client may invoke an interpreter to interpret and execute the compiled file or the executable code.
In one embodiment, the executable code is python code, and accordingly, in this step, the interpreter generates an intermediate file in pyc format corresponding to the python code, and removes the intermediate file after the execution of the interpretation of the intermediate file is completed. It should be noted that the removal of the pyc file can be achieved in various ways, such as setting an environment variable or using a-B parameter when executing. Thus, the user cannot acquire the pyc file, and the executable code cannot be pushed back according to the pyc file.
Therefore, the client can realize the corresponding code function by calling the interpreter to execute the code file.
According to another embodiment, after the client calls the interpreter to execute the code file, the server updates the remaining allowable use duration of the first user for the server code product according to the feedback information of the client.
In summary, with the method for protecting the code security disclosed in the embodiment of the present specification, only when a legal user uses a customized interpreter, the decryption of a code product delivered to the user can be implemented, so that an illegal user cannot decrypt the code product, and the difficulty of a reverse code product is upgraded to the reverse customized interpreter, thereby effectively protecting the code security and preventing the code from being illegally migrated and used.
Further, protection is also performed from the full lifecycle of code execution, including: the method comprises the steps of a code layer (such as packaging, obfuscating, randomizing, serializing, compressing and encrypting a source code), a compiler layer (such as compiling processing), an interpreter layer (including a customized interpreter), runtime (such as hardware permission detection), a scheduling layer (such as requiring a user to transmit account verification information when an authorized code is used), and a physical layer (such as collecting hardware information of current hardware), so that a malicious user must attack each layer to have reverse success, difficulty is exponential pain points, code safety is fully and effectively guaranteed, and algorithm core logic is not leaked when a code product is output through a line.
According to another aspect of the embodiments, the scheme disclosed in the embodiments of the present specification is described below from the perspective of performing full-life multi-layer encryption protection on code. Fig. 5 shows a block diagram of a multi-layered encrypted code protection scheme according to an embodiment, which is a code layer, a compiler layer, an interpreter layer, a scheduling layer, and a physical layer in sequence from top to bottom. It should be noted that when the original code to be protected is used to implement the model algorithm, an AI environment layer may be further included, which is located between the interpreter layer and the scheduling layer.
At the code level, according to one embodiment, a code obfuscation operation may be performed based on the original code, for example, directly obfuscating the original code or obfuscating the encapsulated function implementation code. Illustratively, obfuscation of code may be implemented using an obfuscator or like library. According to another embodiment, the original code may be encapsulated with a partial function, for example, the modifier wrapper may be used to insert the encapsulated code into the original code, so as to obtain the function implementation code. According to another embodiment, an AST syntax tree corresponding to the original code can be constructed, and one layer of encryption and decryption is performed through the AST syntax tree, so that reverse engineering at a lexical and semantic analysis level is prevented. In a particular embodiment, the encryption and decryption of the AST syntax tree includes serialization, deserialization, and encoding and decoding operations. Illustratively, serialization and deserialization can be implemented using a pickle or the like library, and encoding and decryption can be implemented using base64 or the like.
At the compiler layer, the original plaintext codes can be complied and encrypted to obtain a compiled file. According to one embodiment, the original python clear text code may be compiled to convert a file in pyc format to a binary file in so/pyd format. According to a specific embodiment, a python compiler can be used to perform c-language level secondary encryption on python code, and the security is raised to c/c + + binary files. In this way, python, pyc, etc. can be prevented from being easily decompiled to obtain a plaintext source code.
At the interpreter level, including the interpreter implemented by the customization in the embodiment of the present specification, the interpreter layer is essentially a binary file, and decryption is possible only by requiring assembly-level reverse capability, which is extremely difficult. Inside the interpreter, a decryption operation is realized, and a user is required to input personal information such as a client account number and the like, so that a license-secret key (license-secret) is acquired and input into a symmetric cryptographic algorithm (such as an AES algorithm) for decryption, and then the execution operation of the file can be unlocked. And the interpreter removes the generation of the intermediate pyc file, thereby ensuring that the user cannot acquire the source code which is actually run.
In the AI environment layer, for an actual deep learning model algorithm, when the algorithm is implemented, a specific hardware-authority detection operator (license-operator) is inserted into a corresponding frame (including mainstream tenserflow, restore, mxnet frames, etc.), and this operator can be verified at the runtime level, and is mainly used for detecting the current hardware to ensure that the operation is performed on a permitted machine.
In a scheduling layer, a user is required to transmit account information to verify two major scenes, namely a Virtual Machine (VM) and a container (docker), so as to ensure that a code is authorized to a reasonable user.
In the physical layer, the unique identifier is extracted mainly aiming at specific machine hardware, and the unique identifier comprises instance ID, MAC address, trusted root and other hardware calibration, so that the offline output of the original code is ensured to run on legal hardware.
By the above, the execution of the code is protected in a full life cycle through multi-layer encryption, so that malicious users have to attack each layer to have an opportunity to succeed reversely, and the difficulty is exponential pain points, so that the code safety is fully and effectively guaranteed, and the algorithm core logic is not leaked when the code product is output through a line.
Corresponding to the protection method, the embodiment of the specification also discloses a system for protecting the code security. Fig. 6 is a schematic structural diagram of a system for protecting code security disclosed in an embodiment of the present specification, and as shown in fig. 6, the system 600 includes:
the server 610 is configured to, in response to a request sent by a first user based on the client 620 of the first user, obtain an original code corresponding to the request, perform encryption processing based on the original code by using identification information of the first user, obtain encrypted data, and send the encrypted data to the client 620. And the client 620 is configured to invoke the interpreter 621 to decrypt the encrypted data by using the identification information of the first user to obtain decrypted data, determine a code file corresponding to the original code based on the decrypted data, and invoke the interpreter 621 to execute the code file.
In an embodiment, the server 610 is configured to perform the above encryption processing, and specifically includes: acquiring identification information of the first user, and generating a first key according to the identification information; and performing the encryption processing by using the first key. The client 620 calls the interpreter 621 to perform the decryption process, which specifically includes: the interpreter 621 acquires the identification information and generates a second key according to the identification information; and performing the decryption processing by using the second key.
In a specific embodiment, the interpreter 621 obtains the identification information, including: the interpreter 621 obtains the identification information when the first user passes the authentication.
In an embodiment, the server 610 is configured to obtain the encrypted data, and specifically includes: inserting an encapsulation code into the original code, and encapsulating the original code into a function to obtain a function implementation code; compiling the function implementation code to obtain a compiled file; and encrypting the packaging code by using the identification information to obtain the encrypted data. The client 620 is further configured to: the compiled file is received from the server 610. The client 620 is configured to determine the code file, and specifically includes: based on the package code, the function is called, thereby determining the compiled file corresponding to the function as the code file.
In a specific embodiment, the client 620 is further configured to: and displaying the packaging code to the first user so that the first user carries out algorithm development based on the packaging code.
In a specific embodiment, the server 610 is configured to perform the above encapsulation, and specifically includes: and under the condition that the original code is judged not to belong to the pre-marked complete code, packaging.
In a specific embodiment, the server 610 specifically performs the compiling process, which specifically includes: performing code obfuscation processing on the function implementation code to obtain an obfuscated file; and compiling the obfuscated file to obtain the compiled file.
In an embodiment, the server 610 is configured to obtain encrypted data, and specifically includes: constructing a syntax tree corresponding to the original code; and carrying out the encryption processing based on the syntax tree to obtain encrypted data. The client 620 is configured to determine the code file, and specifically includes: the syntax tree is determined based on the decryption data and rendered into executable code as the code file.
In a specific embodiment, the server 610 is configured to perform the encryption processing, and specifically includes: randomizing the syntax tree by using a random seed to obtain randomized data, and storing a mapping relation between the random seed and the randomized data; the encryption processing is performed based on the randomized data. The client 620 is configured to determine the syntax tree, and specifically includes: determining the randomized data based on the decrypted data; acquiring the corresponding random seed from the server 610 based on the randomized data; and performing anti-randomization treatment on the randomized data by using the random seeds to obtain the syntax tree.
In one example, the server 610 is further configured to: and generating the random seed according to the identification information and the current timestamp.
In another example, the server 610 is configured to perform the encryption processing based on the randomized data, and specifically includes: carrying out serialization processing on the randomized data to obtain sequence data; carrying out compression coding processing on the sequence data to obtain coded data; and performing the encryption processing on the encoded data. The client 620 is configured to determine the randomized data based on the decrypted data, and specifically includes: decoding the decrypted data to obtain the serialized data; and performing deserialization processing on the serialized data to obtain the randomized data.
On the other hand, in a specific embodiment, the server 610 is configured to construct a syntax tree corresponding to the original code, and specifically includes: and under the condition that the original code is judged to belong to the pre-marked complete code, constructing the syntax tree.
In one embodiment, the client 620 is further configured to: by executing the hardware permission detection operator received from the server 610, the hardware information of the hardware platform where the client 620 is located is collected, and whether the hardware information is legal or not is judged. The client 620 is configured to determine a code file, and specifically includes: and determining the code file under the condition that the hardware information is judged to be legal.
In one embodiment, the original code is written based on the python language; the client 620 is configured to invoke the interpreter 621 to execute the code file, and specifically includes: and calling the interpreter 621 to generate an intermediate file in a pyc format corresponding to the code file, and removing the intermediate file after the interpretation execution of the intermediate file is completed.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 1 or fig. 2 or fig. 3 or fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in conjunction with fig. 1 or fig. 2 or fig. 3 or fig. 4.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (14)

1. A method of securing code, comprising:
the server side responds to a request sent by a first user based on a client side of the first user, and obtains an original code corresponding to the request;
the server side carries out encryption processing based on the original code by using the identification information of the first user to obtain encrypted data, and sends the encrypted data to the client side;
the client calls an interpreter to decrypt the encrypted data by using the identification information of the first user to obtain decrypted data;
and the client determines a code file corresponding to the original code based on the decrypted data, and calls the interpreter to execute the code file.
2. The method of claim 1, wherein,
the server side utilizes the identification information of the first user to perform encryption processing based on the original code, and the encryption processing comprises the following steps: the server side obtains identification information of the first user and generates a first key according to the identification information; performing the encryption processing using the first key;
the client calls an interpreter to decrypt the encrypted data by using the identification information of the first user, and the decryption process comprises the following steps: the interpreter acquires the identification information and generates a second key according to the identification information; and performing the decryption processing by using the second key.
3. The method of claim 2, wherein the interpreter obtains the identification information, comprising:
and the interpreter acquires the identification information under the condition that the first user passes identity authentication.
4. The method of claim 1, wherein,
the server side performs encryption processing based on the original code by using the identification information of the first user to obtain encrypted data, and the encryption processing comprises the following steps: inserting an encapsulation code into the original code, and encapsulating the original code into a function to obtain a function implementation code; compiling the function implementation code to obtain a compiled file; encrypting the packaging code by using the identification information to obtain the encrypted data;
before the client determines a code file corresponding to the original code based on the decryption data, the method further comprises: the client receives the compiled file from the server;
the client determines a code file corresponding to the original code based on the decryption data, including: the client calls the function based on the package code, thereby determining the compiled file corresponding to the function as the code file.
5. The method of claim 4, wherein the method further comprises:
the client displays the packaging code to the first user so that the first user can develop an algorithm based on the packaging code.
6. The method of claim 4, wherein encapsulating the original code into a function by inserting an encapsulation code in the original code, resulting in a function implementation code, comprises:
and under the condition that the original code is judged not to belong to the pre-marked complete code, packaging.
7. The method of claim 4, wherein performing compilation processing based on the function implementation code to obtain a compiled file comprises:
performing code obfuscation processing on the function implementation code to obtain an obfuscated file;
and compiling the obfuscated file to obtain the compiled file.
8. The method of claim 1, wherein,
the server side performs encryption processing based on the original code by using the identification information of the first user to obtain encrypted data, and the encryption processing comprises the following steps: constructing a syntax tree corresponding to the original code; performing the encryption processing based on the syntax tree to obtain encrypted data;
the client determines a code file corresponding to the original code based on the decryption data, including: the syntax tree is determined based on the decryption data and rendered into executable code as the code file.
9. The method of claim 8, wherein,
performing the encryption processing based on the syntax tree, including: randomizing the syntax tree by using a random seed to obtain randomized data, and storing a mapping relation between the random seed and the randomized data; performing the encryption processing based on the randomized data;
determining the syntax tree based on the decrypted data, comprising: determining the randomized data based on the decrypted data; acquiring the corresponding random seed from the server based on the randomized data; and performing anti-randomization treatment on the randomized data by using the random seeds to obtain the syntax tree.
10. The method of claim 9, wherein the determining of the random seed comprises:
and the server generates the random seed according to the identification information and the current timestamp.
11. The method of claim 9, wherein,
performing the encryption process based on the randomized data, comprising: carrying out serialization processing on the randomized data to obtain sequence data; carrying out compression coding processing on the sequence data to obtain coded data; performing the encryption processing on the encoded data;
determining the randomized data based on the decrypted data, comprising: decoding the decrypted data to obtain the serialized data; and performing deserialization processing on the serialized data to obtain the randomized data.
12. The method of claim 8, wherein constructing a syntax tree corresponding to the original code comprises:
and under the condition that the original code is judged to belong to the pre-marked complete code, constructing the syntax tree.
13. The method of claim 1, wherein,
before the client determines a code file corresponding to the original code based on the decryption data, the method further comprises: the client acquires hardware information of a hardware platform where the client is located by executing a hardware permission detection operator received from the server, and judges whether the hardware information is legal or not;
the client determines a code file corresponding to the original code based on the decryption data, including: and determining the code file under the condition that the hardware information is judged to be legal.
14. The method of claim 1, wherein the original code is written based on a python language; calling the interpreter to execute the code file, wherein the calling the interpreter to execute the code file comprises the following steps:
the interpreter generates an intermediate file in a pyc format corresponding to the code file;
and the interpreter removes the intermediate file after the execution of the interpretation of the intermediate file is finished.
CN202111603927.0A 2021-12-24 2021-12-24 Method and device for protecting code security Pending CN114329357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111603927.0A CN114329357A (en) 2021-12-24 2021-12-24 Method and device for protecting code security

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111603927.0A CN114329357A (en) 2021-12-24 2021-12-24 Method and device for protecting code security

Publications (1)

Publication Number Publication Date
CN114329357A true CN114329357A (en) 2022-04-12

Family

ID=81013503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111603927.0A Pending CN114329357A (en) 2021-12-24 2021-12-24 Method and device for protecting code security

Country Status (1)

Country Link
CN (1) CN114329357A (en)

Similar Documents

Publication Publication Date Title
KR101471589B1 (en) Method for Providing Security for Common Intermediate Language Program
JP6227772B2 (en) Method and apparatus for protecting a dynamic library
CN111143869B (en) Application package processing method and device, electronic equipment and storage medium
US20110271350A1 (en) method for protecting software
CN113010856A (en) Dynamic asymmetric encryption and decryption JavaScript code obfuscation method and system
CN109660353A (en) A kind of application program installation method and device
WO2023029447A1 (en) Model protection method, device, apparatus, system and storage medium
CN103971034A (en) Method and device for protecting Java software
CN114547558A (en) Authorization method, authorization control method and device, equipment and medium
CN113190877A (en) Model loading method and device, readable storage medium and electronic equipment
KR20140139392A (en) Method for generating application execution file for mobile device, application execution method of mobile device, device for generating application execution file and mobile device
CN111159658B (en) Byte code processing method, system, device, computer equipment and storage medium
JP2007515723A (en) Software execution protection using active entities
CN112115430A (en) Apk reinforcement method, electronic equipment and storage medium
CN114546506B (en) Authorization method, device, equipment and medium for embedded operating system
CN114329357A (en) Method and device for protecting code security
CN115310057A (en) Encryption and decryption method, device, equipment and storage medium for preventing inverse compilation
CN110535642B (en) Method for distributing storage keys, intelligent terminal and storage medium
US20190199694A1 (en) Individual encryption of control commands
CN112597453A (en) Program code encryption and decryption method and device
KR101907846B1 (en) Apparatus, method for encryption using dependency integrity check of androids and other similar systems
CN108427559B (en) Script file generation and calling method and device
CN111291333A (en) Java application program encryption method and device
CN106843853B (en) Method and device for protecting user information
CN111475844A (en) Data sharing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination