WO2021244054A1

WO2021244054A1 - Contract code obfuscation platform and obfuscation method based on smart contract bytecode features

Info

Publication number: WO2021244054A1
Application number: PCT/CN2021/074634
Authority: WO
Inventors: 周亚金; 程镇; 吴磊; 任奎
Original assignee: 浙江大学
Priority date: 2020-06-02
Filing date: 2021-02-01
Publication date: 2021-12-09
Also published as: CN111680271A

Abstract

Disclosed is a contract code obfuscation platform based on smart contract bytecode features. The obfuscation platform comprises a bytecode/instruction converter, an information extractor, a bytecode injector, a jump target re-parser, and an instruction/bytecode converter. In the platform, an original bytecode is converted into an instruction sequence, and according to an obfuscation method, an instruction position needing to be rewritten and an original jump target address are extracted; an insertion instruction is then generated, and the insertion instruction is inserted in a corresponding position of the instruction sequence; a jump address of the instruction sequence is then corrected, so that the jump address corresponds to a correct jump address; and finally, the corrected instruction sequence is converted into a bytecode so that an obfuscated bytecode is obtained, and the obfuscated bytecode is output. In the present invention, by obfuscating a contract bytecode, contract information of a contract creator can be protected from being easily parsed by a tool, thereby reducing the risk of persons freely stealing codes from an on-chain contract.

Description

Contract code obfuscation platform and obfuscation method based on smart contract bytecode features

Technical field

The invention relates to the field of smart contracts, in particular to a contract code obfuscation platform and method based on smart contract bytecode features.

Background technique

Smart contracts are an idea put forward by Nick Sabo in the 1990s, almost the same age as the Internet. Due to the lack of a credible execution environment, smart contracts have not been applied to the actual industry. Since the birth of Bitcoin, people have realized that the underlying technology of Bitcoin, the blockchain, can inherently provide a credible execution environment for smart contracts. Smart contract is an assembly language programmed on the blockchain. Usually people don't write bytecode by themselves, but they compile it from a higher-level language.

Since the blockchain is an open distributed ledger, the information on it is publicly visible to everyone, and the code is often reused, and because the contract stored on the chain is in the form of bytecode, it is difficult for people Read, so in order to understand a contract that has not published the source code, people usually use various analysis methods to understand how the contract works to achieve different purposes. For a contract developer who is unwilling to easily let others copy or even find loopholes to attack his contract, such an environment is very harsh.

Summary of the invention

The purpose of the present invention is to provide a smart contract code obfuscation platform for the situation that the existing chain contract code can be easily parsed by various analysis tools, and contract developers can rewrite the contract code through this platform to avoid this before deploying the contract. Case.

The purpose of the present invention is achieved through the following technical solutions:

A contract code obfuscation platform based on smart contract bytecode features, which is characterized in that the platform includes:

The bytecode/instruction converter is used to receive the original bytecode, and convert the original bytecode into an instruction sequence according to the target obfuscation method, so as to express the executable section;

The information extractor is used to extract the injected instruction sequence and the information needed for re-analysis of the jump target according to the obfuscation method, including the instruction position that needs to be rewritten and the original jump target address, and save it, and The instruction position that needs to be rewritten is sent to the bytecode injector, and the original jump target address is sent to the jump target reparser;

The bytecode injector generates insert instructions according to the obfuscation method, inserts them at the corresponding positions in the instruction sequence, forms a new instruction sequence, and sends it to the jump target re-parser;

Jump target re-parser, used to correct the jump address of the new instruction sequence to make it correspond to the correct jump target;

The instruction/bytecode converter is used to convert the corrected instruction sequence into bytecode, that is, the obfuscated bytecode, and output it.

A contract code obfuscation method based on smart contract bytecode features, the method specifically includes the following steps:

S1: The contract developer generates the original bytecode through the smart contract compiler;

S2: Enter the original bytecode into the contract code obfuscation platform, and select the obfuscation method you want to use;

S3: The contract code obfuscation platform converts the original bytecode into an instruction sequence, and according to the obfuscation method, extracts the instruction position that needs to be rewritten and the original jump target address, and then generates the insert instruction and inserts it in Insert the insert instruction in the corresponding position of the instruction sequence, then correct the jump address of the instruction sequence to make it correspond to the correct jump address, and finally convert the corrected instruction sequence into bytecode, that is, after confusion Bytecode and output it.

Further, the S3 specifically includes the following sub-steps:

S3.1: Linear scan the original bytecode, and in this process, the contract initialization code segment and Swarm hash segment are identified through the default contract initialization code segment and the Swarm hash beginning and end feature codes given by the contract compiler;

S3.2: Decompile the original bytecode into the instructions and immediate data of the Ethereum virtual machine, and use this information to create a copy of the contract;

S3.3: By maintaining the simulation stack, the contract code obfuscation platform executes the contract code step by step, traverses all the branches that can be reached, and identifies the function selector section, contract function section and data section of the contract during the process, and copies the contract The jump instruction and the value used in the jump instruction are marked out;

S3.4: Generate and insert the instruction sequence corresponding to the obfuscation method according to the marked instruction;

S3.5: Correct the misplaced jump address in the instruction to complete the confusion.

Further, the obfuscation method selects any one of the following methods:

(1) By adding a PUSH instruction, the tool can find the initial characteristics of the two contract initialization codes when scanning the bytecode linearly, causing the tool to identify the wrong contract body code;

(2) By rewriting the bytecode in the Swarm hash segment, and rewriting the instructions near the jump instruction in the contract, all the jump instructions in the original contract will jump to their target address through the Swarm hash segment, and Flattening the control flow graph of the contract;

(3) Prevent the contract analysis tool from obtaining the function signature stored in the contract by interrupting the feature sequence of the function selector;

(4) By inserting a large number of JUMPDEST instructions, the analysis tools that use symbolic execution and simulation execution are forced to maintain a large number of basic block entry states, which makes the tool run slowly or even crash;

(5) By changing the immediate value used for the jump to the result of a series of immediate operations, the tool whose default jump target address must be the immediate value cannot resolve the jump address in the contract;

(6) By putting the target address used for jumping into memory and retrieving it by calling the pre-compiled contract, the static analysis tool mistakenly believes that a contract on the chain is called, thus losing the jump target address information.

The beneficial effects of the present invention are as follows:

In view of the situation that some people may use existing analysis tools to parse the bytecode of smart contracts, contract developers can use the obfuscation device and method of the present invention to add a layer of confusion to the contract and enhance the unreadableness of the bytecode of their own contracts. Thereby strengthening the protection of the contract code.

Description of the drawings

Figure 1 is a schematic diagram of a contract code obfuscation platform based on smart contract bytecode features;

Figure 2 is a flowchart of a contract code obfuscation method based on smart contract bytecode features.

detailed description

The following describes the present invention in detail based on the accompanying drawings and preferred embodiments. The purpose and effects of the present invention will become more apparent. It should be understood that the specific embodiments described here are only used to explain the present invention and are not intended to limit the present invention.

As shown in Figure 1, the contract code obfuscation platform based on smart contract bytecode features of the present invention includes:

The information extractor is used to extract the injected instruction sequence and the information required for re-analysis of the jump target according to the obfuscation method, including the instruction position that needs to be rewritten and the original jump target address, and save it, and The instruction position that needs to be rewritten is sent to the bytecode injector, and the original jump target address is sent to the jump target reparser;

The bytecode injector generates insert instructions according to the obfuscation method, and inserts them at the corresponding position in the instruction sequence to form a new instruction sequence and send it to the jump target re-parser; because this action will change the size of the original bytecode, As a result, the original jump address no longer corresponds to the correct jump target, so a new instruction sequence will need to be sent to the jump target reparser to correct these misplaced jump addresses.

As shown in Figure 2, the contract code obfuscation method based on smart contract bytecode features of the present invention specifically includes the following steps:

S3: The contract code obfuscation platform converts the original bytecode into an instruction sequence, and according to the obfuscation method, extracts the instruction position that needs to be rewritten and the original jump target address, and then generates the insert instruction and inserts it in Insert the insert instruction at the corresponding position of the instruction sequence, then correct the jump address of the instruction sequence to make it correspond to the correct jump address, and finally convert the corrected instruction sequence into bytecode, that is, after confusion Bytecode and output it. The specific process is as follows:

In particular, there are six obfuscation methods used here:

(1) By adding a PUSH instruction, the tool can find the initial characteristics of the two contract initialization codes when scanning the bytecode linearly, causing the tool to identify the wrong contract body code.

(2) By rewriting the bytecode in the Swarm hash segment, and rewriting the instructions near the jump instruction in the contract, all the jump instructions in the original contract will jump to their target address through the Swarm hash segment, and The control flow graph of the contract is flattened.

(3) By interrupting the feature sequence of the function selector, the contract analysis tool is prevented from obtaining the function signature stored in the contract.

(4) By inserting a large number of JUMPDEST instructions, the analysis tools that use symbolic execution and simulation execution are forced to maintain a large number of basic block entry states, which makes the tool run slowly or even crash.

(5) By changing the immediate value used for the jump to the result of a series of immediate operations, the tool whose default jump target address must be the immediate value cannot resolve the jump address in the contract.

(6) By putting the target address used for jumping into the memory and retrieving it by calling the precompiled contract, the static analysis tool mistakenly believes that a contract on the chain is called (that is, the dynamic information in the smart contract scenario, The static analysis tool believes that such information cannot be known), and the jump destination address information is lost.

Those of ordinary skill in the art can understand that the above descriptions are only preferred examples of the invention and are not intended to limit the invention. Although the invention has been described in detail with reference to the foregoing examples, for those skilled in the art, they can still The technical solutions recorded in the foregoing examples are modified, or some of the technical features are equivalently replaced. All modifications and equivalent substitutions made within the spirit and principle of the invention shall be included in the protection scope of the invention.

Claims

A contract code obfuscation platform based on smart contract bytecode features, which is characterized in that the platform includes:

The bytecode/instruction converter is used to receive the original bytecode, and convert the original bytecode into an instruction sequence according to the target obfuscation method, so as to express the executable section;

The information extractor is used to extract the injected instruction sequence and the information needed for re-analysis of the jump target according to the obfuscation method, including the instruction position that needs to be rewritten and the original jump target address, and save it, and The instruction position that needs to be rewritten is sent to the bytecode injector, and the original jump target address is sent to the jump target reparser;

The bytecode injector generates insert instructions according to the obfuscation method, inserts them at the corresponding positions in the instruction sequence, forms a new instruction sequence, and sends it to the jump target re-parser;

Jump target re-parser, used to correct the jump address of the new instruction sequence to make it correspond to the correct jump target;

The instruction/bytecode converter is used to convert the corrected instruction sequence into bytecode, that is, the obfuscated bytecode, and output it.
A contract code obfuscation method based on smart contract bytecode features, which is characterized in that the method specifically includes the following steps:

S1: The contract developer generates the original bytecode through the smart contract compiler;

S2: Input the original bytecode into the contract code obfuscation platform in claim 1, and select the desired obfuscation method;

S3: The contract code obfuscation platform converts the original bytecode into an instruction sequence, and according to the obfuscation method, extracts the instruction position that needs to be rewritten and the original jump target address, and then generates the insert instruction and inserts it in Insert the insert instruction in the corresponding position of the instruction sequence, then correct the jump address of the instruction sequence to make it correspond to the correct jump address, and finally convert the corrected instruction sequence into bytecode, that is, after confusion Bytecode and output it.
The contract code obfuscation method based on smart contract bytecode features according to claim 2, wherein said S3 specifically includes the following sub-steps:

S3.1: Linear scan the original bytecode, and in this process, the contract initialization code segment and Swarm hash segment are identified through the default contract initialization code segment and the Swarm hash beginning and end feature codes given by the contract compiler;

S3.2: Decompile the original bytecode into the instructions and immediate data of the Ethereum virtual machine, and use this information to create a copy of the contract;

S3.3: By maintaining the simulation stack, the contract code obfuscation platform executes the contract code step by step, traverses all the branches that can be reached, and identifies the function selector section, contract function section and data section of the contract during the process, and copies the contract The jump instruction and the value used in the jump instruction are marked out;

S3.4: Generate and insert the instruction sequence corresponding to the obfuscation method according to the marked instruction;

S3.5: Correct the misplaced jump address in the instruction to complete the confusion.
The contract code obfuscation method based on smart contract bytecode features according to claim 2, wherein the obfuscation method selects any one of the following methods:

(1) By adding a PUSH instruction, the tool can find the initial characteristics of the two contract initialization codes when scanning the bytecode linearly, causing the tool to identify the wrong contract body code;

(2) By rewriting the bytecode in the Swarm hash segment, and rewriting the instructions near the jump instruction in the contract, all the jump instructions in the original contract will jump to their target address through the Swarm hash segment, and Flattening the control flow graph of the contract;

(3) Prevent the contract analysis tool from obtaining the function signature stored in the contract by interrupting the feature sequence of the function selector;

(4) By inserting a large number of JUMPDEST instructions, the analysis tools that use symbolic execution and simulation execution are forced to maintain a large number of basic block entry states, which makes the tool run slowly or even crash;

(5) By changing the immediate value used for the jump to the result of a series of immediate operations, the tool whose default jump target address must be the immediate value cannot resolve the jump address in the contract;

(6) By putting the target address used for the jump into the memory and retrieving it by calling the precompiled contract, the static analysis tool mistakenly believes that a contract on the chain is called, thus losing the jump target address information.