WO2023116256A1 - 一种智能合约的控制流图生成方法及装置 - Google Patents

一种智能合约的控制流图生成方法及装置 Download PDF

Info

Publication number
WO2023116256A1
WO2023116256A1 PCT/CN2022/131537 CN2022131537W WO2023116256A1 WO 2023116256 A1 WO2023116256 A1 WO 2023116256A1 CN 2022131537 W CN2022131537 W CN 2022131537W WO 2023116256 A1 WO2023116256 A1 WO 2023116256A1
Authority
WO
WIPO (PCT)
Prior art keywords
jump
instruction
address
function
record
Prior art date
Application number
PCT/CN2022/131537
Other languages
English (en)
French (fr)
Inventor
何嘉浩
张俊麒
苏小康
张开翔
范瑞彬
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023116256A1 publication Critical patent/WO2023116256A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly

Definitions

  • the present application relates to the field of network technology, and in particular to a method and device for generating a control flow graph of a smart contract.
  • the smart contract is generally detected for vulnerabilities or attacks by drawing the control flow graph of the smart contract, and the detection methods include static analysis and dynamic analysis.
  • the static analysis is based on the control flow graph, traverses the data flow of all possible paths in the control flow graph, and judges whether there is an execution path that triggers a vulnerability in the control flow.
  • Dynamic analysis analyzes the safe execution path in the control flow graph before the program runs, and after the smart contract is deployed, judges whether the smart contract triggers the established execution path during execution to determine whether an unknown vulnerability is triggered.
  • Embodiments of the present invention provide a method and device for generating a control flow graph of a smart contract, which are suitable for accurately and completely generating a control flow graph of a smart contract in the form of EVM bytecode.
  • an embodiment of the present invention provides a method for generating a control flow graph of a smart contract, the method comprising:
  • a jump record is generated for the jump instruction in the smart contract, and a function record is generated for the function instruction in the smart contract;
  • the function record is used to represent a boundary address of a function instruction;
  • the compiler translates the smart contract into the bytecode of the Ethereum virtual machine, and generates jump records and function records corresponding to the smart contract.
  • the compiler records the jump address of the jump instruction and the boundary address of the function instruction in the smart contract in the bytecode form of the Ethereum virtual machine during the compilation process, so that the generation of subsequent control flow graphs does not need to combine context semantics, that is, The jump and boundary information of the function in the smart contract can be obtained.
  • the compiler takes instruction basic blocks as nodes, and generates directed edges between nodes according to jump instructions or function instructions in each instruction basic block, sequence information of multiple instruction basic blocks, jump records and function records, and obtains A control flow graph for a smart contract.
  • the compiler can generate a control flow graph according to the corresponding jump instruction and function instruction corresponding to the instruction basic block and the instruction basic block, the jump record and the function record, and the sequence information between the instruction basic blocks to ensure the control flow completeness and accuracy of the diagram.
  • a jump record is generated for the jump instruction in the smart contract, including:
  • each JUMP/JUMPI jump instruction in the control instruction is used to represent the jump information executed by each instruction in the smart contract.
  • the address of each jump command and the corresponding jump address are recorded in the jump record, and the corresponding position of the jump semantics of the command in the smart contract can be determined without obtaining the context information of the smart contract, and the jump information of the smart contract can be accurately recorded control flow graph.
  • control instructions in the smart contract generate a jump address corresponding to the control instruction; for each JUMP/JUMPI jump instruction in the control instruction, generate jump information corresponding to the jump instruction ,include:
  • a FALSE jump address and an END jump address are generated; when translated to the JUMPI jump instruction corresponding to the conditional judgment statement, the address where the JUMPI jump instruction is located and the FALSE jump The jump address is used as the jump information of the JUMPI jump instruction; after the JUMP jump instruction is translated to the FALSE content, the address of the JUMP jump instruction and the END jump address are used as the jump information of the JUMP jump instruction.
  • the address where the JUMPI jump instruction is located and the jump end address are used as the jump information of the JUMPI jump instruction;
  • the address where the JUMP instruction is located and the jump start address are used as the jump information of the JUMPI instruction.
  • the address of the jump command at the beginning of the loop and the jump end address of the loop statement are obtained, and the position of entering the loop is recorded, and the jump at the end of the loop statement is obtained.
  • the address of the instruction and the jump start address also record the position where the next cycle is restarted from this cycle.
  • the execution semantics of the loop statement corresponding to the loop instruction in the control instruction is determined according to the addresses of the two sets of binary instructions, so that the subsequent generated control flow graph records the complete execution path of the loop instruction.
  • the address where the JUMPI jump instruction is located and the FALSE jump address are used as the jump information of the JUMPI jump instruction;
  • the address where the JUMP jump instruction is located and the END jump address are used as the jump information of the JUMP jump instruction.
  • the position of the meta-expression obtain the address of the JUMP jump instruction after the conditional statement/ternary expression is translated into FALSE content and the END jump address, and also record the "out" from the conditional statement/ternary expression Location. Then, the execution semantics of the conditional statement in the conditional statement/ternary expression is determined according to the addresses of these two sets of binary instructions, so that the subsequent generated control flow graph records the complete execution path of the conditional statement/ternary expression. The above realizes the execution semantics of each instruction in the control statement, and improves the integrity and accuracy of the control flow graph.
  • generate function records for the function instructions in the smart contract including:
  • the function entry address of the private function is the first parameter of the call address address
  • the call address is the address of the JUMP instruction corresponding to the function call statement
  • the return address is the address of the JUMPDEST instruction corresponding to the function call statement.
  • the compiler generates function entry information according to the public function and private function in the translation process, and according to the jump instruction address (call address-address of JUMP instruction) and return address (call completes and Return the corresponding instruction address-the address of the JUMPDEST instruction) to generate a function record, and generate a jump record according to the address of the jump instruction of the control statement and the jump address.
  • the complete execution process of a function is recorded, that is, the function entry, jump, and function semantic conversion of the call, and then the generated control flow graph completely and accurately records the entry and jump of each function in the smart contract.
  • the semantic conversion of the call is convenient for subsequent identification of the vulnerability of the smart contract and the attacked function object according to the semantic conversion information of the function in the control flow graph, and improves the accuracy of detection and positioning.
  • the address s of the i-th instruction is determined as the block exit address of the instruction basic block, and each of the block entry address e to the block exit address s
  • the instruction serves as an instruction basic block and the block entry address e of the instruction basic block is updated to be the address of the i+1th instruction.
  • generating directed edges between nodes according to the jump record and the function record including:
  • the block exit address of the first node corresponds to a jump address in the jump record or the call information of the function record corresponds to a return address
  • determine the second node pointed to by the first node continue to determine the The third node pointed to by the second node, until the nodes corresponding to the function entry addresses in the function entry information are all traversed.
  • determine the second node pointed to by the first node according to whether the block exit address of the first node corresponds to a jump address in the jump record or the call information of the function record corresponds to a return address include:
  • the block exit address of the first node corresponds to a jump address in the jump record, generate a directed edge from the first node pointing to the second node corresponding to the jump address; if the first The block exit address in the instruction basic block of the node corresponds to a return address in the call information recorded by the function, then a directed edge pointing to the second node corresponding to the return address from the first node is generated;
  • the compiler determines the first node corresponding to the function entry address according to the function entry address in the function record, if the exit instruction in the instruction basic block of the first node is a jump instruction, and the jump instruction is determined from the jump record The address of the jump instruction and the corresponding jump address, then find the second node containing the corresponding jump address according to the corresponding jump address, and obtain the directed edge from the first node to the second node.
  • the second The instruction basic block of the node contains the instruction corresponding to the target address.
  • the instruction basic block of the second node contains the instruction corresponding to the return address; if the block-out instruction in the instruction basic block of the first node is a non-jump instruction , then the directed edge from the first node to the second node is obtained according to the order information, and at this time, the second node is the node corresponding to the instruction basic block following the instruction basic block of the first node.
  • an embodiment of the present invention provides an apparatus for generating a control flow graph of a smart contract, the apparatus comprising:
  • the translation and recording module is used to generate a jump record for the jump instructions in the smart contract during the process of translating the smart contract into the bytecode of the Ethereum virtual machine, and generate a function for the function instructions in the smart contract record;
  • the jump record is used to represent the jump address of the jump instruction;
  • the function record is used to represent the boundary address of the function instruction;
  • the building block is also used to use each instruction basic block as a node in the control flow graph, generate directed edges between the nodes according to the jump record, the function record and the sequence information, and obtain A control flow graph for the smart contract.
  • the embodiment of the present application also provides a computing device, including: a memory for storing programs; a processor for invoking the programs stored in the memory, and executing various methods according to the first aspect according to the obtained programs. methods described in Possible Designs.
  • the embodiment of the present application also provides a computer-readable non-volatile storage medium, including a computer-readable program, and when the computer reads and executes the computer-readable program, the computer executes the computer-readable program according to the first aspect.
  • a computer-readable non-volatile storage medium including a computer-readable program
  • the computer executes the computer-readable program according to the first aspect.
  • FIG. 1 is a schematic structural diagram of a control flow graph generation system for a smart contract provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a compiler generating a control flow graph provided by an embodiment of the present invention
  • FIG. 3 is a schematic flow diagram of a method for generating a control flow graph of a smart contract provided by an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a loop statement structure and a jump instruction provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of another loop statement structure and a jump instruction provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a conditional statement/ternary expression structure and a jump instruction provided by an embodiment of the present invention
  • FIG. 7 is a control flow diagram of a smart contract provided by an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a method for generating a control flow graph of a smart contract provided by an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an apparatus for generating a control flow graph of a smart contract provided by an embodiment of the present invention.
  • Fig. 1 is the system architecture of a control flow graph generation system of a smart contract provided by an embodiment of the present invention, wherein the translation module performs translation according to the smart contract to obtain the bytecode of the Ethereum virtual machine (EVM bytecode).
  • the jump address capture module captures the address of the jump instruction in the control statement and the jump address to obtain a jump record
  • the function boundary capture module captures the function entry address of the public function in the function selector in the smart contract
  • the function body The function entry address of the private function and the call address and return address of the function call statement in the function body obtain the function record.
  • the basic block division module converts the EVM bytecode into an instruction sequence through a disassembly tool, and divides the converted instruction sequence into multiple instruction basic blocks.
  • the control flow graph construction module takes instruction basic blocks as nodes, and obtains a control flow graph according to jump records, function records and the directed edges between the multiple instruction basic blocks construction nodes.
  • the embodiment of the present invention also provides a schematic diagram of a compiler generating a control flow graph, as shown in FIG. 2 , wherein the Solidity compiler can implement translation functions, jump address capture functions, and function boundary capture functions.
  • the translation function translates according to the smart contract to obtain the bytecode of the Ethereum virtual machine (EVM bytecode).
  • EVM bytecode the Ethereum virtual machine
  • the jump address capture function captures the address of the jump instruction in the control statement and the jump address to obtain the jump record
  • the function boundary capture function captures the function entry address of the public function in the function selector in the smart contract, the function entry address of the private function in the function body, the call address of the function call statement in the function body, and the return address to get the function record .
  • the basic block division module converts the EVM bytecode into an instruction sequence through a disassembly tool, and divides the converted instruction sequence into multiple instruction basic blocks.
  • the Solidity compiler takes the instruction basic block as a node, and constructs the directed edges between the nodes according to the jump record, the function record and the multiple instruction basic blocks to obtain the control flow graph.
  • the embodiment of the present application provides a flow of a smart contract control flow graph generation method, as shown in Figure 3, including:
  • Step 301 in the process of translating the smart contract into the bytecode of the Ethereum virtual machine, generate a jump record for the jump instruction in the smart contract, and generate a function record for the function instruction in the smart contract;
  • the jump record is used to represent the jump address of the jump instruction;
  • the function record is used to represent the boundary address of the function instruction;
  • Step 302 converting the EVM bytecode into an EVM instruction sequence through a disassembly tool, and dividing the EVM instruction sequence to obtain a plurality of instruction basic blocks with sequence information;
  • Step 303 using each instruction basic block as a node in the control flow graph, generating directed edges between nodes according to the jump record, the function record and the sequence information, to obtain the smart contract Control flow graph.
  • the compiler translates the smart contract into the bytecode of the Ethereum virtual machine, and generates jump records and function records corresponding to the smart contract.
  • the compiler records the jump address of the jump instruction in the smart contract in the bytecode form of the Ethereum Virtual Machine and the boundary address of the function instruction during the compilation process, so that the generation of subsequent control flow graphs does not need to be combined with context semantics Get the jump and boundary information of the function in the smart contract.
  • the compiler takes instruction basic blocks as nodes, and generates directed edges between nodes according to jump instructions or function instructions in each instruction basic block, sequence information of multiple instruction basic blocks, jump records and function records, and obtains A control flow graph for a smart contract.
  • the compiler can generate a control flow graph according to the corresponding jump instruction and function instruction corresponding to the instruction basic block and the instruction basic block, the jump record and the function record, and the sequence information between the instruction basic blocks to ensure the control flow completeness and accuracy of the diagram.
  • the embodiment of the present application provides a jump record generation method, which generates a jump record for the jump instruction in the smart contract, including:
  • statements generating JUMP/JUMPI instructions are mainly control statements, including loop statements, conditional statements, and ternary expressions.
  • the address of the JUMP/JUMPI corresponding to the control statement and the jump address can be captured during the translation process to obtain the jump information of the control statement.
  • the translation method can include the Solidity compiler traversing the syntax tree of the smart contract source code to generate EVM bytecode.
  • an array of global variables jumpTarget (jump target) is set to store a binary array (jumpAddr (address of JUMP/JUMPI), target (jump address)). That is to say, the jump information can be recorded in the form of binary data, that is, the jump record contains at least one binary array.
  • the embodiment of the present application provides a jump information generation method, for the control instructions in the smart contract, generate the jump address corresponding to the control instruction; for each JUMP/JUMPI jump instruction in the control instruction , generate the jump information corresponding to the jump instruction, including:
  • a FALSE jump address and an END jump address are generated; when translated to the JUMPI jump instruction corresponding to the conditional judgment statement, the address where the JUMPI jump instruction is located and the FALSE jump The jump address is used as the jump information of the JUMPI jump instruction; after the JUMP jump instruction is translated to the FALSE content, the address of the JUMP jump instruction and the END jump address are used as the jump information of the JUMP jump instruction.
  • the loop statement of the loop instruction in the control instruction includes a While statement/For statement.
  • the embodiment of the present application provides a schematic diagram of a loop statement structure and a jump instruction, as shown in FIG. 4 , including:
  • the Solidity compiler initializes two jump addresses loopStart (jump start address) and loopEnd (jump end address), and after placing JUMPDEST (corresponding address 0x12) at the loopStart address, translate the While statement The conditional judgment statement and JUMPI (corresponding to address 0x25) in the statement, followed by translation of the loop body part in the While statement (corresponding to addresses 0x34 to 0x4f), and finally insert the JUMPDEST of loopEnd (corresponding to address 0x50).
  • the loop statement of the loop instruction in the control instruction includes a Do-While statement.
  • the embodiment of the present application provides another schematic diagram of a loop statement structure and a jump instruction, as shown in FIG. 5 , including:
  • the Solidity compiler will initialize two jump addresses loopStart (jump start address) and loopEnd (jump end address). After placing JUMPDEST (corresponding to address 0xa) at the loopStart address, translate the content of the loop body, and finally translate the conditional judgment statement and place the JUMPDEST instruction (corresponding to address 0x51) at the loopEnd address after placing the JUMPI instruction (corresponding to address 0x52).
  • conditional statement/ternary expression in the control instruction includes an If-Else statement/Triple statement.
  • the embodiment of the present application provides a schematic diagram of a conditional statement/ternary expression structure and a jump instruction, as shown in FIG.
  • Figure 6 in the translation function of the If-Else statement and the Triple statement of the ternary expression, the translation steps of the If-Else statement and the Triple statement of the ternary expression are basically the same.
  • the Solidity compiler will initialize two jump addresses If-False (FALSE jump address) and If-End (END jump address ), and place JUMPI (corresponding to address 0xa9) in the conditional judgment statement.
  • the jump address is regarded as If-End-JUMPDEST (corresponding to address 0xc2), and the binary array (0xbc, 0xc2) is obtained, and the two binary The arrays are all recorded in the jumpTarget array.
  • the If statement is a shortened version of the If-Else statement.
  • the Solidity compiler will translate the content of the conditional statement after completing the initialization of the jump address If-False.
  • translate the contents of the if statement structure body and place the JUMPDEST instruction (corresponding to address 0xbd) whose address is If-False at the end.
  • the compiler translates to the JUMPI instruction, it regards If-False as its jump address, obtains a binary array (0xa9, 0xbd), and records it on the jumpTarget array.
  • the embodiment of the present application provides a method for generating a function record, which generates a function record for the function instructions in the smart contract, including:
  • the function entry address of the private function is the first parameter of the call address address
  • the smart contract source code is:
  • the Solidity compiler translates the source code of the above smart contract into the bytecode of the Ethereum virtual machine.
  • the bytecode of the Ethereum virtual machine is as follows (a string of hexadecimal strings - EVM bytecode):
  • the Solidity compiler converts the above-mentioned Ethereum virtual machine bytecode into an instruction sequence through a disassembly tool, and further divides the instruction sequence into 17 instruction basic blocks.
  • the instruction basic blocks are as follows:
  • the instruction address corresponding to the function selector in the smart contract is 0x0 to 0x57
  • the function entry address of the public function in the function selector is 0x58 marked in PUSHI corresponding to the address 0x53, that is, the public function
  • the function entry address of is the address of the first JUMPDEST instruction (corresponding to the address 0x58 in the above basic block) after parameter loading and translation.
  • the function entry address of the public function is recorded when the Solidity compiler translates the function selector. After the function entry of the public function processes the translation of parameter loading, the address 0x58 of the first JUMPDEST instruction after the Solidity compiler completes the parameter loading and translation is used as the function entry of the public function, and 0x58 is added to the entryList array.
  • the instruction address corresponding to the function body in the smart contract is 0x58 to 0xa2, and the function entry address of the private function in the function body is 0x95 (the instruction address 0x95 to 0xa2 corresponds to the private function func2 in the source code), that is, the private function
  • the function entry address is the address of the first parameter of the call address (corresponding to the address 0x95 in the above basic block).
  • the Solidity compilation will only translate it when the function is called.
  • the function call statement in the function body of the smart contract is that func1 calls func2, and returns func1 from func2, func1 calls func2 corresponding call address - JUMP instruction (corresponding address 0x74), and the return address is the JUMPDEST instruction corresponding to the function call statement (corresponding address 0x75), return the call address corresponding to func1 from func2 - the JUMP instruction (corresponding address 0x83), and the return address is the JUMPDEST instruction corresponding to the function call statement (corresponding address 0x84).
  • the return address RET will be initialized first, and the RET address will be saved in the operand stack first, and then the JUMP instruction (corresponding to address 0x74 or 0x83) will be used to jump into the function of the called function (func2) Entry, and finally add the JUMPDEST instruction of the RET address (corresponding to address 0x75 or 0x84) after the call is completed.
  • the address record where the JUMP instruction is located is regarded as the call address call
  • the ret address is regarded as the return address
  • the (call, ret) binary is recorded in the cList array.
  • cList [(0x74, 0x75), (0x83, 0x84)] is obtained.
  • jumpTargets [(0x66, 0x92), (0x6e, 0x7a), (0x79, 0x86), (0x94, 0x5e) ⁇ .
  • the embodiment of this application provides a method for obtaining jumpTargets, cList and entryList during the compilation process of the above example smart contract, including:
  • the address 0x74 of the corresponding JUMP instruction and the address 0x75 of the return jump target JUMPDEST instruction after the end of the call form a two-tuple (0x74, 0x75 ) into the cList array.
  • the address 0x95 of the jump target JUMPDEST of the JUMP instruction is regarded as the entry of the function func2, and 0x95 is added to the entryList.
  • the embodiment of the present application provides a method for dividing the instruction basic block, and divides the instruction sequence of the Ethereum virtual machine to obtain a plurality of orderly instruction basic blocks, including:
  • the address s of the i-th instruction is determined as the block exit address of the instruction basic block, and each of the block entry address e to the block exit address s
  • the instruction serves as an instruction basic block and the block entry address e of the instruction basic block is updated to be the address of the i+1th instruction. That is to say, i represents the sequence number of the instruction in the instruction sequence, initialize i to 0, and set the entry address e of the instruction basic block to 0, then take the i-th instruction instr from the EVM instruction sequence.
  • the instruction basic block exit address s is set to the address of the i-1th instruction, and each instruction from the entry address e of the instruction basic block to the instruction basic block exit address s is a basic block, ( e-s) added to blockList (instruction basic block record). At the same time, set the basic block entry e as the address of the i-th instruction.
  • instr is a JUMP/JUMPI, REVERT, SELFDESTRUCT instruction
  • add (e, s ) to blockList At the same time, set the basic block entry e as the address of the i+1th instruction.
  • the embodiment of the present application provides a method for obtaining the instruction basic block sequence, including:
  • the seventh instruction is a JUMPI instruction
  • the current address is 0xb, (0x0, 0xb) is added to the blockList, and the instruction basic block entry e is set to the address 0xc corresponding to the i+1th instruction.
  • Output blockList [(0x0,0xb),(0xc,0x3e),(0x3f,0x43),(0x44,0x4a),(0x4b,0x4e),(0x4f,0x55),(0x56,0x57),(0x58,0x5d ),(0x5e,0x66),(0x67,0x6e),(0x6f,0x74),(0x67,0x6e),(0x6f,0x74),(0x75,0x79),(0x7a,0x83),(0x84,0x85), (0x86, 0x91), (0x92, 0x94), (0x95, 0xa2)], including the entry address and exit address of each instruction basic block of the 17 instruction basic blocks in the above example.
  • the embodiment of the present application provides a method for generating a control flow graph of a smart contract. According to the jump records and the function records, directed edges between nodes are generated, including:
  • the method flow of constructing the control flow graph includes:
  • control flow graph G Initialize the control flow graph G, and regard all elements (e, s) in the blockList array as nodes on the control flow graph. At this time, the control flow graph G is an empty graph with only nodes and no edges.
  • step b Take out a function entry e from the entryList, search for the instruction basic block block whose entry address is e in the blockList, if the search is successful, add the block to the queue Q, and skip to step c. Otherwise, take another function entry from entryList and repeat step b. If all the function entries in the entryList have been processed, the control flow graph G is output.
  • step c If the queue Q is not empty, take out the instruction basic block block' from the queue Q, and jump to step d. If the queue Q is empty, go to step b. If the instruction of the exit address s of the instruction basic block block' is JUMP/JUMPI, then jump to step e; if it is a REVERT, SELFDESTRUCT instruction, then jump to step c; otherwise adjust to step e.
  • step f If the instruction of the exit address s of block' is JUMP/JUMPI, then find its jump address from jumpTargets. If the search is successful, block' is added to the directed edge of the instruction basic block b_next with the jump address as the entry in the control flow graph G, and b_next is added to the queue Q (if it is a JUMPI instruction, the entry address Add the instruction basic block b_next' for s+1 to Q, add block' to the directed edge of b_next' in the control flow graph G), and skip to step c for processing. If the search fails, proceed to step f for processing.
  • step c Use the exit address s of block' to query whether JUMP/JUMPI is a function call from cList. If the search is successful, add block' in the control flow graph G to the basic block b_next with the corresponding function return address ret as the entry. to edge and add b_next to queue Q. Finally skip to step c for processing.
  • the embodiment of the present application provides a method for generating a control flow graph of a smart contract. Whether the block exit address of the first node corresponds to a jump address in the jump record or the call information of the function record corresponds to a return The address is used to determine the second node pointed to by the first node, including:
  • the block exit address of the first node corresponds to a jump address in the jump record, generate a directed edge from the first node pointing to the second node corresponding to the jump address; if the first The block exit address in the instruction basic block of the node corresponds to a return address in the call information recorded by the function, then a directed edge pointing to the second node corresponding to the return address from the first node is generated;
  • the method of constructing the control flow chart according to the entryList, cList, jumpTargets and blockList of the 17 instruction basic blocks includes:
  • the queue Q is not empty, take out the basic block (0x58, 0x5d) from the queue Q, because the instruction of 0x5d is POP, jump to step d.
  • the queue Q is not empty, take out the basic block (0x5e, 0x66) from the queue Q, because the instruction of 0x66 is JUMPI, jump to step e for processing.
  • the queue Q is not empty, take out the basic block (0x92, 0x94) from the queue Q, because the instruction of 0x94 is JUMP, jump to step e for processing.
  • step f Because there is no jump to address 0x94 in jumpTarget, jump to step f for processing.
  • the queue Q is not empty, take out the instruction basic block (0x67, 0x6e) from the queue Q, because the instruction of 0x6e is JUMPI, jump to step e for processing.
  • the queue Q is not empty, take out the instruction basic block (0x7a, 0x83) from the queue Q, because the instruction of 0x83 is JUMP, jump to step e for processing.
  • the queue Q is not empty, take out the instruction basic block (0x6f, 0x74) from the queue Q, because the instruction of 0x74 is JUMP, jump to step e for processing.
  • step f Because there is no jump with address 0x74 (JUMP instruction) in jumpTarget, jump to step f for processing.
  • the queue Q is not empty, take out the instruction basic block (0x84, 0x85) from the queue Q, because the instruction of 0x85 is POP, jump to step d for processing.
  • the queue Q is not empty, take out the instruction basic block (0x75, 0x79) from the queue Q, because the instruction of 0x79 is JUMP, jump to step e for processing.
  • the queue Q is not empty, take out the instruction basic block (0x86, 0x91) from the queue Q, because the instruction of 0x91 is JUMP, jump to step e for processing.
  • the queue Q is not empty, take out the instruction basic block (0x95, 0xa2) from the queue Q, because the instruction of 0xa2 is JUMP, jump to step e for processing.
  • step f Because there is no jump with address 0x95 in jumpTarget, jump to step f.
  • b.entryList is empty, end the algorithm.
  • the control flow chart of the smart contract is obtained through the above method steps, as shown in FIG. 7 .
  • the embodiment of the present application provides a flow of a method for generating a control flow graph of a smart contract, as shown in Figure 8, including:
  • Step 801 obtain the smart contract.
  • Step 802 translate the smart contract into the bytecode of the Ethereum virtual machine, and obtain the jump record and function record of the smart contract.
  • Step 803 converting the EVM bytecode into an instruction sequence.
  • Step 804 Divide the instruction sequence to obtain instruction basic blocks and sequence information of each instruction basic block.
  • Step 805 taking the instruction basic block as a node, generating directed edges between nodes according to the jump records and function records, and obtaining the control flow graph of the smart contract.
  • FIG. 9 is a schematic diagram of a device for generating a control flow graph of a smart contract provided in an embodiment of the present application. As shown in FIG. 9 , it includes:
  • the translation and recording module 901 is used to generate a jump record for the jump instruction in the smart contract during the process of translating the smart contract into the bytecode of the Ethereum virtual machine, and generate Function record; the jump record is used to represent the jump address of the jump instruction; the function record is used to represent the boundary address of the function instruction;
  • the construction module 902 is used to convert the bytecode of the Ethereum virtual machine into an instruction sequence of the Ethereum virtual machine through a disassembly tool, and divide the instruction sequence of the Ethereum virtual machine to obtain a plurality of basic instructions with sequence information piece;
  • the construction module 902 is further configured to use each instruction basic block as a node in the control flow graph, and generate directed edges between the nodes according to the jump record, the function record and the sequence information, Obtain the control flow graph of the smart contract.
  • the translating and recording module 901 is specifically configured to: generate a jump address corresponding to the control instruction for the control instruction in the smart contract;
  • the instruction generates jump information corresponding to the jump instruction, and the jump information includes an address where the jump instruction is located and a jump address corresponding to the jump instruction.
  • the translating and recording module 901 is specifically configured to: generate a jump address corresponding to the control instruction for the control instruction in the smart contract; instruction, generating the jump information corresponding to the jump instruction, including: for the loop instruction in the control instruction, generating the jump start address and the jump end address of the loop instruction; when translating to the JUMPI jump instruction, the The address of the JUMPI jump instruction and the jump end address are used as the jump information of the JUMPI jump instruction; when transferring to the last JUMP jump instruction, the address of the JUMP jump instruction and the jump start address are used as the JUMPI jump
  • the address of the instruction and the FALSE jump address are used as the jump information of the JUMPI jump instruction; after the JUMP jump instruction is translated to the FALSE content, the address of
  • the translating and recording module 901 is specifically configured to: when translating to the function selector in the smart contract, record the function entry address of the public function in the function selector to the function in the function record Entry information, the function entry address of the public function is the address of the first JUMPDEST instruction after parameter loading and translation; when translating to the function body in the smart contract, the function entry address of the private function in the function body Record to the function entry information; the function entry address of the private function is the address of the first parameter of the call address; when translated into the function call statement in the function body, the call address of the function call statement and The return address is recorded in the call information in the function record; the call address is the address of the JUMP instruction corresponding to the function call statement; the return address is the address of the JUMPDEST instruction corresponding to the function call statement.
  • the construction module 902 is specifically used to set the block entry address of the instruction basic block as e; obtain instructions sequentially from the Ethereum virtual machine instruction sequence, and if the i-th instruction is a JUMPDEST instruction, the i-th instruction will be The address s of i-1 instructions is determined as the block exit address of the instruction basic block, each instruction from the block entry address e to the block exit address s is regarded as a instruction basic block and the block entry address e of the instruction basic block is updated as the i-th The address of the instruction is; if the i-th instruction is a JUMP/JUMP1 jump instruction, a REVERT instruction, or a SELFDESTRUCT instruction, then the address s of the i-th instruction is determined as the block exit address of the instruction basic block, and the block entry address e is set to the block Each instruction of the exit address s is used as a basic instruction block, and the block entry address e of the updated instruction basic block is the address of the i+1th instruction.
  • the construction module 902 is specifically configured to, according to the function entry information in the function record, obtain the first node corresponding to any function entry address; whether the block exit address of the first node is in the The jump record corresponds to a jump address or the call information of the function record corresponds to a return address, determine the second node pointed to by the first node; continue to determine the third node pointed to by the second node until the function The nodes corresponding to the entry addresses of each function in the entry information are all traversed.
  • the construction module 902 is specifically configured to, if the block exit address of the first node corresponds to a jump address in the jump record, generate the first node pointing to the jump address corresponding to A directed edge of the second node; if the block exit address in the instruction basic block of the first node corresponds to a return address in the call information recorded by the function, then generate the first node to point to the first node corresponding to the return address A directed edge of two nodes; otherwise, generate the first node to point to the second node after the first node according to the sequence information.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明实施例提供一种智能合约的控制流图生成方法及装置,该方法包括:在将智能合约转译为以太坊虚拟机字节码过程中,针对智能合约中的跳转指令生成跳转记录,且针对智能合约中的函数指令生成函数记录;跳转记录用于表征跳转指令的跳转地址;函数记录用于表征函数指令的边界地址;通过反汇编工具将以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;将每个指令基本块作为控制流图中的节点,根据跳转记录、函数记录和顺序信息,生成各节点之间的有向边,得到智能合约的控制流图。上述方法用于准确完整的生成EVM字节码形式的智能合约的控制流图。

Description

一种智能合约的控制流图生成方法及装置
相关申请的交叉引用
本申请要求在2021年12月24日提交中国专利局、申请号为202111598801.9、申请名称为“一种智能合约的控制流图生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及网络技术领域,尤其涉及一种智能合约的控制流图生成方法及装置。
背景技术
近年来,随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性、实时性要求,也对技术提出更高的要求。如,为了提高以太坊安全性,防止发生以太坊智能合约在开发过程的编写过程中引入漏洞,攻击者利用隐藏在智能合约中的漏洞发起攻击,造成用户的数字资产被窃取的危害,一般会对智能合约进行相应的漏洞或攻击的检测。
现有技术中一般通过绘制智能合约的控制流图对智能合约进行漏洞或攻击检测,检测方法包括静态分析和动态分析。其中,静态分析以控制流图为基础,遍历控制流图中所有可能路径的数据流,判断是否在控制流中存在触发漏洞的执行路径。动态分析在程序运行前,在控制流图中分析出安全执行路径,并在智能合约部署后,判断智能合约在执行过程中是否触发既定的执行路径,用以判断是否有未知漏洞被触发。这种方法虽然可以准确检测智能合约的漏洞和攻击,但随着区块链技术的发展,智能合约的形式从传统字节码X86逐渐发展出EVM字节码(以太坊虚拟机字节码)。而目标语言为x86的编译器在编译到跳转语句时,可以直接分析参数标签捕获其跳转地址,目标语言为EVM字节码的编译器在编译跳转语句时,因为跳转地址存放在运行栈中,需要根据当前编译状态的上下文才可捕获其跳转地址。其次,与x86字节码的编译器使用特殊的call指令处理函数调用语句不同,目标语言为EVM字节码的编译器在编译函数调用语句时,直接共用跳转指令JUMP来实现函数调用功能,为此需要结合上下文语义才能区分JUMP指令的功能场景。因此,传统的控制流图构建方案并不适用于以太坊智能合约的EVM字节码。
因此,现在亟需一种智能合约的控制流图生成方法及装置,适用于准确完整的生成EVM字节码形式的智能合约的控制流图。
发明内容
本发明实施例提供一种智能合约的控制流图生成方法及装置,适用于准确完整的生成EVM字节码形式的智能合约的控制流图。
第一方面,本发明实施例提供一种智能合约的控制流图生成方法,该方法包括:
在将智能合约转译为以太坊虚拟机字节码过程中,针对所述智能合约中的跳转指令生成跳转记录,且针对所述智能合约中的函数指令生成函数记录;所述跳转记录用于表征跳转指令的跳转地址;所述函数记录用于表征函数指令的边界地址;
通过反汇编工具将所述以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将所述以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;
将每个指令基本块作为控制流图中的节点,根据所述跳转记录、所述函数记录和所述顺序信息,生成各节点之间的有向边,得到所述智能合约的控制流图。
上述方法中,编译器将智能合约转译为以太坊虚拟机字节码,并生成智能合约对应的跳转记录和函数记录。如此,编译器在编译过程中记录了,以太坊虚拟机字节码形式的智能合约中跳转指令的跳转地址和函数指令的边界地址,使得后续控制流图的生成无需结合上下文语义,即可得到智能合约中函数的跳转和边界信息。编译器以指令基本块为节点,根据每个指令基本块中的跳转指令或函数指令、多个指令基本块的顺序信息和跳转记录和函数记录生成各节点之间的有向边,得到智能合约的控制流图。如此,编译器可以根据指令基本块和指令基本块对应的跳转指令和函数指令、跳转记录和函数记录中的对应关系,以及指令基本块之间的顺序信息生成控制流图,保证控制流图的完整性和准确性。
可选的,针对所述智能合约中的跳转指令生成跳转记录,包括:
针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,跳转信息包括跳转指令所在的地址及跳转指令对应的跳转地址。
上述方法中,控制指令中的每个JUMP/JUMPI跳转指令用于表示智能合约中各指令执行的跳转信息。如此,使得跳转记录中记录各跳转指令所在地址和对应的跳转地址,无需获取智能合约上下文信息即可确定智能合约中指令的跳转语义的对应位置,生成准确记录智能合约跳转信息的控制流图。
可选的,针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,包括:
针对所述控制指令中的循环指令,生成所述循环指令的跳转开始地址和跳转结束地址;在转译至JUMPI跳转指令时,将JUMPI跳转指令所在的地址及跳转结束地址作为JUMPI跳转指令的跳转信息;在转移至最末尾的JUMP 跳转指令时,将JUMP跳转指令所在的地址及跳转开始地址作为JUMPI跳转指令的跳转信息;
针对所述控制指令中的条件语句/三元表达式,生成FALSE跳转地址和END跳转地址;在转译至条件判断语句对应的JUMPI跳转指令,将JUMPI跳转指令所在的地址及FALSE跳转地址作为JUMPI跳转指令的跳转信息;在转译至FALSE内容后的JUMP跳转指令,将JUMP跳转指令所在的地址及END跳转地址作为JUMP跳转指令的跳转信息。
上述方法中,针对控制指令中的循环指令,在转译至JUMPI跳转指令时,将JUMPI跳转指令所在的地址及跳转结束地址作为JUMPI跳转指令的跳转信息;在转移至最末尾的JUMP跳转指令时,将JUMP跳转指令所在的地址及跳转开始地址作为JUMPI跳转指令的跳转信息。如此,针对控制指令中的循环指令,获取该循环语句的开始循环的跳转指令的所在的地址及跳转结束地址,也就记录了进入循环的位置,获取该循环语句的末尾循环的跳转指令的所在的地址及跳转开始地址,也就记录了从这一循环重新开始下一循环的位置。则根据这两组二元指令地址确定了控制指令中循环指令对应的循环语句的执行语义,使得后续生成的控制流图中记录了该循环指令的完整执行路径。
针对控制指令中的条件语句/三元表达式,在转译至条件判断语句对应的JUMPI跳转指令,将JUMPI跳转指令所在的地址及FALSE跳转地址作为JUMPI跳转指令的跳转信息;在转译至FALSE内容后的JUMP跳转指令,将JUMP跳转指令所在的地址及END跳转地址作为JUMP跳转指令的跳转信息。如此,针对控制指令中的条件语句/三元表达式,获取条件语句/三元表达式的语句开始执行的跳转指令的所在的地址及FALSE跳转地址,也就记录了进入条件语句/三元表达式的位置,获取条件语句/三元表达式转译至FALSE内容后的JUMP跳转指令的所在的地址及END跳转地址,也就记录了从条件语句/三元表达式“出来”的位置。则根据这两组二元指令地址确定了条件语句/三元表达式中条件语句的执行语义,使得后续生成的控制流图中记录了该条件语句/三元表达式的完整执行路径。以上实现了针对控制语句中各指令的执行语义,提高控制流图的完整性和准确性。
可选的,针对所述智能合约中的函数指令生成函数记录,包括:
在转译到所述智能合约中的函数选择器时,将所述函数选择器中的公共函数的函数入口地址记录至函数记录中的函数入口信息,所述公共函数的函数入口地址为参数加载转译后的首个JUMPDEST指令的地址;
在转译到所述智能合约中的函数体时,将所述函数体中的私有函数的函数入口地址记录至所述函数入口信息;所述私有函数的函数入口地址为调用地址的首个参数的地址;
在转译到所述函数体中的函数调用语句时,将所述函数调用语句的调用地址和返回地址记录至函数记录中的调用信息;所述调用地址为函数调用语句对应的JUMP指令的地址;所述返回地址为函数调用语句对应的JUMPDEST指令的地址。
上述方法中,编译器在转译过程中根据公共函数、私有函数生成函数入 口信息,根据函数调用语句的函数调用入口的跳转指令地址(调用地址-JUMP指令的地址)和返回地址(调用完成并回归对应的指令地址-JUMPDEST指令的地址)生成函数记录,根据控制语句的跳转指令的所在地址和跳转地址生成跳转记录。如此,使得记录了一个函数完整的执行过程,即,函数入口、跳转、调用的函数语义转换,进而使得生成的控制流图中完整准确的记录了智能合约中每个函数的入口、跳转、调用的语义转换,便于后续根据控制流图中的函数的语义转换信息识别智能合约的漏洞和被攻击的函数对象,提高检测和定位的准确性。
可选的,将所述以太坊虚拟机指令序列进行划分得到有序的多个指令基本块,包括:
设置指令基本块的块入口地址为e;
从所述以太坊虚拟机指令序列中依次获取指令,若第i条指令为JUMPDEST指令,则将第i-1条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i条指令为的地址;
若第i条指令为JUMP/JUMP1跳转指令、REVERT指令、SELFDESTRUCT指令,则将第i条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i+1条指令的地址。
上述方法中,基于开始跳转指令(JUMP/JUMP1跳转指令、REVERT指令、SELFDESTRUCT指令)和跳转完成指令(JUMPDEST指令)准确确定一个完整的执行指令序列,进而得到以太坊虚拟机指令序列中各无跳转执行动作对应的执行序列-指令基本块,以得到控制流图的节点。
可选的,根据所述跳转记录和所述函数记录,生成各节点之间的有向边,包括:
根据所述函数记录中的函数入口信息,得到任一函数入口地址对应的第一节点;
通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点;继续确定所述第二节点指向的第三节点,直至所述函数入口信息中的各函数入口地址对应的节点均遍历结束。
上述方法中,根据函数记录中的函数入口信息、跳转记录中的跳转和调用信息和多个指令基本块的顺序信息,获取智能合约的指令基本块对应函数开始、指令基本块之间函数跳转、调用关系-节点之间的执行顺序,获取控制流图。
可选的,通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点,包括:
若所述第一节点的块出口地址在所述跳转记录对应有跳转地址,则生成所述第一节点指向所述跳转地址对应的第二节点的有向边;若所述第一节点 的指令基本块中块出口地址在所述函数记录的调用信息对应有返回地址,则生成所述第一节点指向所述返回地址对应的第二节点的有向边;
否则,生成所述第一节点指向按照所述顺序信息位于所述第一节点之后的第二节点。
上述方法中,编译器根据函数记录中的函数入口地址确定包含函数入口地址对应的第一节点,若第一节点的指令基本块中出口指令为跳转指令,且从跳转记录中确定该跳转指令的地址和对应的跳转地址,则根据该对应的跳转地址找到包含该对应的跳转地址的第二节点,得到第一节点指向第二节点的有向边,此时,第二节点的指令基本块中包含目标地址对应的指令。若跳转记录不具有该跳转指令的地址和对应的跳转地址,则从函数记录中确定该跳转指令的地址(调用地址)和对应的返回地址,从而确定返回地址对应的第二节点,得到第一节点指向第二节点的有向边,此时,第二节点的指令基本块中包含返回地址对应的指令;若第一节点的指令基本块中的出块指令为非跳转指令,则根据顺序信息得到第一节点指向第二节点的有向边,此时,第二节点为第一节点的指令基本块后的指令基本块对应的节点。
第二方面,本发明实施例提供一种智能合约的控制流图生成装置,该装置包括:
转译和记录模块,用于在将智能合约转译为以太坊虚拟机字节码过程中,针对所述智能合约中的跳转指令生成跳转记录,且针对所述智能合约中的函数指令生成函数记录;所述跳转记录用于表征跳转指令的跳转地址;所述函数记录用于表征函数指令的边界地址;
构建模块,用于通过反汇编工具将所述以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将所述以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;
所述构建模块还用于,将每个指令基本块作为控制流图中的节点,根据所述跳转记录、所述函数记录和所述顺序信息,生成各节点之间的有向边,得到所述智能合约的控制流图。
第三方面,本申请实施例还提供一种计算设备,包括:存储器,用于存储程序;处理器,用于调用所述存储器中存储的程序,按照获得的程序执行如第一方面的各种可能的设计中所述的方法。
第四方面,本申请实施例还提供一种计算机可读非易失性存储介质,包括计算机可读程序,当计算机读取并执行所述计算机可读程序时,使得计算机执行如第一方面的各种可能的设计中所述的方法。
本申请的这些实现方式或其他实现方式在以下实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发 明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种智能合约的控制流图生成系统的架构示意图;
图2为本发明实施例提供的一种编译器生成控制流图的示意图;
图3为本发明实施例提供的一种智能合约的控制流图生成方法的流程示意图;
图4为本发明实施例提供的一种循环语句结构和跳转指令示意图;
图5为本发明实施例提供的又一种循环语句结构和跳转指令示意图;
图6为本发明实施例提供的一种条件语句/三元表达式结构和跳转指令示意图;
图7为本发明实施例提供的一种智能合约的控制流图;
图8为本发明实施例提供的一种智能合约的控制流图生成方法的流程示意图;
图9为本发明实施例提供的一种智能合约的控制流图生成装置示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
图1为本发明实施例提供的一种智能合约的控制流图生成系统的系统架构,其中,转译模块根据智能合约进行转译获取以太坊虚拟机字节码(EVM字节码),在此过程中,跳转地址捕获模块捕获控制语句中跳转指令所在的地址和跳转地址得到跳转记录,函数边界捕获模块捕获智能合约中的函数选择器中的公共函数的函数入口地址、函数体中的私有函数的函数入口地址和函数体中的函数调用语句的调用地址和返回地址得到函数记录。转译模块将智能合约转译完成后,基本块划分模块通过反汇编工具将EVM字节码转换为指令序列,并将转换得到的指令序列划分得到多个指令基本块。控制流图构建模块以指令基本块为节点,根据跳转记录、函数记录和该多个指令基本块构建节点间的有向边得到控制流图。
基于上述系统架构,本发明实施例还提供了一种编译器生成控制流图的示意图,如图2所示,其中,Solidity编译器可以实现转译功能、跳转地址捕获功能、函数边界捕获功能。转译功能根据智能合约进行转译获取以太坊虚拟机字节码(EVM字节码),在此过程中,跳转地址捕获功能捕获控制语句中跳转指令所在的地址和跳转地址得到跳转记录,函数边界捕获功能捕获智能合约中的函数选择器中的公共函数的函数入口地址、函数体中的私有函数的函数入口地址和函数体中的函数调用语句的调用地址,以及返回地址得到函 数记录。转译功能将智能合约转译完成后,基本块划分模块通过反汇编工具将EVM字节码转换为指令序列,并将转换得到的指令序列划分得到多个指令基本块。Solidity编译器以指令基本块为节点,根据跳转记录、函数记录和该多个指令基本块构建节点间的有向边得到控制流图。
基于此,本申请实施例提供了一种智能合约的控制流图生成方法的流程,如图3所示,包括:
步骤301、在将智能合约转译为以太坊虚拟机字节码过程中,针对所述智能合约中的跳转指令生成跳转记录,且针对所述智能合约中的函数指令生成函数记录;所述跳转记录用于表征跳转指令的跳转地址;所述函数记录用于表征函数指令的边界地址;
步骤302、通过反汇编工具将所述以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将所述以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;
步骤303、将每个指令基本块作为控制流图中的节点,根据所述跳转记录、所述函数记录和所述顺序信息,生成各节点之间的有向边,得到所述智能合约的控制流图。
上述方法中,编译器将智能合约转译为以太坊虚拟机字节码,并生成智能合约对应的跳转记录和函数记录。如此,编译器在编译过程中记录了以太坊虚拟机字节码形式的智能合约中跳转指令的跳转地址,以及函数指令的边界地址,使得后续控制流图的生成无需结合上下文语义即可得到智能合约中函数的跳转和边界信息。编译器以指令基本块为节点,根据每个指令基本块中的跳转指令或函数指令、多个指令基本块的顺序信息和跳转记录和函数记录生成各节点之间的有向边,得到智能合约的控制流图。如此,编译器可以根据指令基本块和指令基本块对应的跳转指令和函数指令、跳转记录和函数记录中的对应关系,以及指令基本块之间的顺序信息生成控制流图,保证控制流图的完整性和准确性。
本申请实施例提供了一种跳转记录生成方法,针对所述智能合约中的跳转指令生成跳转记录,包括:
针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,跳转信息包括跳转指令所在的地址及跳转指令对应的跳转地址。在一种示例中,根据Solidity语法规则,生成JUMP/JUMPI指令的语句主要是控制语句,包括循环语句、条件语句、三元表达式三类。针对控制语句,可以在转译过程中捕获该控制语句对应的JUMP/JUMPI所在的地址和跳转地址,得到该控制语句的跳转信息。如此,实现针对控制语句对应的转译函数中各语句对应的高级语义,记录对应JUMP/JUMPI指令所在的地址和跳转地址的跳转信息。这里转译的方式可以包括Solidity编译器遍历智能合约源代码语法树生成EVM字节码。在一种示例中,在编译器转译的过程中,设置全局变量jumpTarget(跳转目标)数组,用于存放二元数组(jumpAddr(JUMP/JUMPI所在的地址),target(跳转地址))。也就是说,跳转信息可以以二元数据的形 式记录,即,跳转记录中包含至少一个二元数组。
本申请实施例提供了一种跳转信息生成方法,针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,包括:
针对所述控制指令中的循环指令,生成所述循环指令的跳转开始地址和跳转结束地址;在转译至JUMPI跳转指令时,将JUMPI跳转指令所在的地址及跳转结束地址作为JUMPI跳转指令的跳转信息;在转移至最末尾的JUMP跳转指令时,将JUMP跳转指令所在的地址及跳转开始地址作为JUMPI跳转指令的跳转信息;
针对所述控制指令中的条件语句/三元表达式,生成FALSE跳转地址和END跳转地址;在转译至条件判断语句对应的JUMPI跳转指令,将JUMPI跳转指令所在的地址及FALSE跳转地址作为JUMPI跳转指令的跳转信息;在转译至FALSE内容后的JUMP跳转指令,将JUMP跳转指令所在的地址及END跳转地址作为JUMP跳转指令的跳转信息。
在一种示例中,控制指令中的循环指令的循环语句包括While语句/For语句,本申请实施例提供了一种循环语句结构和跳转指令示意图,如图4所示,包括:
在While/For语句的转译功能中,Solidity编译器初始化两个跳转地址loopStart(跳转开始地址)和loopEnd(跳转结束地址),在loopStart地址放置JUMPDEST(对应地址0x12)后,转译While语句中的条件判断语句和JUMPI(对应地址0x25),紧接着转译While语句中的循环体部分(对应地址0x34至0x4f),最后插入loopEnd的JUMPDEST(对应地址0x50)。相应的,在转译并生成跳转记录过程中,在转译至JUMPI指令(对应地址0x25)时将其跳转地址视为loopEnd-JUMPDEST(对应地址0x50),将(0x25,0x50)记录入jumpTarget数组内。最后转译至While/For语句最末尾的JUMP指令(对应地址0x4f)时,将该JUMP指令跳转地址视为loopStart-JUMPDEST(对应地址0x12),把(0x4f,0x12)记录入jumpTarget数组内。
当转译至While/For循环体内的break语句对应的JUMP指令(对应地址0x34)时,在jumpTarget数组上记录该JUMP指令所在的地址和对应跳转地址(当前循环所初始化的地址loopEnd),即,(0x34,0x50)。
当转译至While/For循环体内的continue语句对应的JUMP指令(对应地址0x3a)时,在jumpTarget数组上记录该JUMP指令所在的地址和对应跳转地址(当前循环所初始化的地址loopStart),即,(0x3a,0x12)。
在一种示例中,控制指令中的循环指令的循环语句包括Do-While语句,本申请实施例提供了又一种循环语句结构和跳转指令示意图,如图5所示,包括:
在原始的Do-While语句的转译功能中,Solidity编译器会初始化两个跳转地址loopStart(跳转开始地址)和loopEnd(跳转结束地址)。在loopStart地址放置JUMPDEST(对应地址0xa)后转译循环体的内容,最后转译条件判断语句并在放置JUMPI指令(对应地址0x51)后,在loopEnd地址放置 JUMPDEST指令(对应地址0x52)。相应的,在转译至JUMPI指令(对应地址0x51)时记录将该JUMPI指令的跳转地址视为loopStart-JUMPDEST(对应地址0xa),并在jumpTarget数组内记录JUMPI指令所在的地址和对应跳转地址(0x51,0xa)。
当转译至Do-While循环体内的break语句对应的JUMP指令(对应地址0x32)时,在jumpTarget数组上记录该JUMP指令所在的地址和对应跳转地址(当前循环所初始化的地址loopEnd-JUMPDEST),即,(0x32,0x52)。
当转译至Do-While循环体内的continue语句对应的JUMP指令(对应地址0x3c)时,在jumpTarget数组上记录该JUMP指令所在的地址和对应跳转地址(当前循环所初始化的地址loopStart),即,(0x3c,0xa)。
在一种示例中,控制指令中的条件语句/三元表达式包括If-Else语句/Triple语句,本申请实施例提供了一种条件语句/三元表达式结构和跳转指令示意图,如图6所示,在If-Else语句和三元表达式的Triple语句的转译功能中,If-Else语句和三元表达式的Triple语句的转译步骤基本相同。此处,以If-Else语句为例,在If-Else语句的初始转译功能中,Solidity编译器会初始化两个跳转地址If-False(FALSE跳转地址)和If-End(END跳转地址),并在条件判断语句放置JUMPI(对应地址0xa9)。
随后转译if语句true内容并以JUMP指令(对应地址0xbc)结尾。最后,放置if-True的JUMPDEST指令后,转译if语句的false内容并以If-False的JUMPDEST指令(对应地址0xbd)结尾。在转译至条件判断语句的JUMPI指令(对应地址0xa9)时,将If-False-JUMPDEST(对应地址0xbd)视为该JUMPI指令的跳转地址,得到二元数组(0xa9,0xbd)。
在转译至false内容后的JUMP指令(对应地址0xbc)时,将其跳转地址视为If-End-JUMPDEST(对应地址0xc2),得到二元数组(0xbc,0xc2),并把两个二元数组均记录入jumpTarget数组内。
If语句是If-Else语句的缩减版本。在If语句的转译功能中,Solidity编译器在完成会初始化跳转地址If-False后再转译条件语句的内容。最后转译条件语句对应的JUMPI指令(对应地址0xa9)后,转译if语句结构体内的内容,并在末尾放置地址为If-False的JUMPDEST指令(对应地址0xbd)。在编译器转译至JUMPI指令时,将If-False视为其跳转地址,得到二元数组(0xa9,0xbd),并在jumpTarget数组上记录。
本申请实施例提供了一种函数记录生成方法,针对所述智能合约中的函数指令生成函数记录,包括:
在转译到所述智能合约中的函数选择器时,将所述函数选择器中的公共函数的函数入口地址记录至函数记录中的函数入口信息,所述公共函数的函数入口地址为参数加载转译后的首个JUMPDEST指令的地址;
在转译到所述智能合约中的函数体时,将所述函数体中的私有函数的函数入口地址记录至所述函数入口信息;所述私有函数的函数入口地址为调用地址的首个参数的地址;
在转译到所述函数体中的函数调用语句时,将所述函数调用语句的调用 地址和返回地址记录至函数记录中的调用信息;所述调用地址为函数调用语句对应的JUMP指令的地址;所述返回地址为函数调用语句对应的JUMPDEST指令的地址。在一种示例中,智能合约源代码为:
Figure PCTCN2022131537-appb-000001
Solidity编译器将上述智能合约源代码转译为以太坊虚拟机字节码,以太坊虚拟机字节码如下所示(一串16进制字符串-EVM字节码):
Figure PCTCN2022131537-appb-000002
Solidity编译器通过反汇编工具将上述以太坊虚拟机字节码转换得到指令序列,且进一步将指令序列划分为17个指令基本块,指令基本块如下所示:
Figure PCTCN2022131537-appb-000003
则在该示例中,智能合约中的函数选择器对应的指令地址为0x0至0x57,该函数选择器中的公共函数的函数入口地址为0x53地址对应的PUSHI中标记的0x58,即可知,公共函数的函数入口地址为参数加载转译后的首个JUMPDEST指令(对应上述基本块中的地址0x58)的地址。
即,这里对于公共函数,在Solidity编译器转译函数选择器时记录公共函数的函数入口地址。公共函数的函数入口在处理参数加载的转译后,将Solidity编译器完成参数加载转译后的首条JUMPDEST指令所在的地址0x58,作为该公共函数的函数入口,向entryList数组中添加0x58。
智能合约中的函数体对应的指令地址为0x58至0xa2,该函数体中的私有函数的函数入口地址为0x95(指令地址0x95至0xa2对应源代码中的私有函数func2),即可知,私有函数的函数入口地址为调用地址的首个参数的地址(对应上述基本块中的地址0x95)。
即,这里对于私有函数,Solidity编译会在该函数被调用时才会进行转译,上述分析“调用地址-返回地址”的过程中,将调用地址的JUMP指令(0x74)的首个参数地址(对应地址0x95)视为私用函数的函数入口地址0x95。若entryList数组中没有存放过0x95,则向entryList中添加私有函数入口地址0x95。 此时,得到entryList=【0x58,0x95】。
智能合约的函数体中的函数调用语句为func1调用func2,和从func2返回func1,func1调用func2对应的调用地址-JUMP指令(对应地址0x74),返回地址为函数调用语句对应的JUMPDEST指令(对应地址0x75),从func2返回func1对应的调用地址-JUMP指令(对应地址0x83),返回地址为函数调用语句对应的JUMPDEST指令(对应地址0x84)。
即,这里在编译函数调用语句时,会先初始化返回地址RET后,先把RET地址保存在操作数栈中,随后使用JUMP指令(对应地址0x74或0x83)跳入被调用函数(func2)的函数入口,最后在完成调用后添加RET地址的JUMPDEST指令(对应地址0x75或0x84)。在转译过程中将JUMP指令所在的地址记录视为调用地址call,把ret地址视为返回地址,最后把(call,ret)二元组记录到cList数组中。此时,得到cList=【(0x74,0x75),(0x83,0x84)】。
根据上述控制语句的跳转地址获取方法,可以获取本示例中控制语句的跳转地址记录,jumpTargets=【(0x66,0x92),(0x6e,0x7a),(0x79,0x86),(0x94,0x5e)】。
此处,本申请实施例提供了上述示例智能合约编译过程中获取jumpTargets、cList和entryList的方法,包括:
1、当编译器转译智能合约中的函数选择器时,记录公有函数func1()的入口为0x58,将0x58加入到entryList数组中。
2、当编译器转译func1的for循环语句中的i<5对应跳转语句JUMPI后,将JUMPI指令的地址0x66和其跳转目标JUMPDEST指令的地址0x92组成二元组(0x66,0x92)加入到jumpTarget数组中。
3、当编译器转译到func1的if条件判断语句的i<3对应跳转语句JUMPI后,将JUMPI指令的地址0x6e和其跳转目标JUMPDEST指令的地址0x7a组成二元组(0x6e,0x7a)加入到jumpTarget数组中。
4、当编译器转译到func1的函数调用语句“func2(i)”后,将对应的JUMP指令地址0x74,以及结束调用后的返回跳转目标JUMPDEST指令的地址0x75组成二元组(0x74,0x75)加入到cList数组中。其次,将JUMP指令的跳转目标JUMPDEST的地址0x95视为函数func2的入口,将0x95加入到entryList中。
5、当编译器完成func1的if为真的语句块后,将对应的JUMP指令地址0x79,以及跳转目标指令JUMPDEST的地址0x86组成二元组(0x79,0x86)加入到jumpTarget数组中。
6、当编译器转译到func1的函数调用语句“func2(i*3)”后,将对应的JUMP指令地址0x83,以及结束调用后的返回跳转目标JUMPDEST指令的地址0x84组成二元组(0x83,0x84)加入到cList数组中。
7、当编译器转译到func1的for循环递增语句“i++”后,将对应的JUMP指令地址0x94,以及跳转目标指令JUMPDEST地址0x5e组成二元组(0x94,0x5e)加入到jumpTarget数组中。
本申请实施例提供了一种指令基本块的划分方法,将所述以太坊虚拟机 指令序列进行划分得到有序的多个指令基本块,包括:
设置指令基本块的块入口地址为e;
从所述以太坊虚拟机指令序列中依次获取指令,若第i条指令为JUMPDEST指令,则将第i-1条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块,并更新指令基本块的块入口地址e为第i条指令为的地址;
若第i条指令为JUMP/JUMP1跳转指令、REVERT指令、SELFDESTRUCT指令,则将第i条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i+1条指令的地址。也就是说,i表示指令序列中指令的顺序编号,将i初始化为0,将指令基本块的入口地址e也设置为0,则从EVM指令序列中取第i条指令instr。
若instr是JUMPDEST指令,将指令基本块出口地址s设置为第i-1条指令的地址,并将从指令基本块的入口地址e到指令基本块出口地址s的各指令为一个基本块,(e-s)添加到blockList(指令基本块记录)。同时将基本块入口e设置为第i条指令的地址。
若instr是JUMP/JUMPI、REVERT、SELFDESTRUCT指令,将基本块出口s设置为第i条指令的地址s,并将从地址e到地址s的各指令视为一个指令基本块,添加(e,s)到blockList。同时将基本块入口e设置为第i+1条指令的地址。
若i+1小于EVM指令序列的长度,递增i=i+1,循环至上述“从EVM指令序列中取第i条指令instr”直至i+1不小于EVM指令序列的长度,输出指令基本块序列blockList。
基于上述示例的指令基本块序列和jumpTargets、cList、entryList,本申请实施例提供了一种获取该指令基本块序列的方法,包括:
a、将i初始化为0,将指令基本块的入口e地址设置为0。
b、从EVM指令序列中取第0条指令PUSHI(对应地址0x0)。
e、因为第0条指令既不是JUMPDEST也不是JUMP/JUMPI等指令,不做任何操作。递增i=i+1=1,跳转至步骤b。
b、取第1条指令PUSHI(对应地址0x2)。
e、因为第1条指令PUSHI既不是JUMPDEST也不是JUMP/JUMPI等指令,不做任何操作。递增i=i+1=2,跳转至步骤b。
…(重复步骤b和e至i=7)。
d、因为第7条指令是JUMPI指令,当前地址为0xb,将(0x0,0xb)添加到blockList中,同时将指令基本块入口e设置为第i+1条指令对应的地址0xc。
e、递增i=i+1,跳转到步骤b。
…(重复步骤b和e至i=19)。
d、因为第19条指令是JUMPI指令,当前地址为0x3e,将(0xc,0x3e)添加到blockList中,同时将指令基本块入口e设置为第i+1条指令对应的地 址0x3f。
e、递增i=i+1,跳转到步骤b。
b、取第20条指令JUMPDEST。
c、将指令基本出口设置为0x3e,将(0xc,0x3e)添加到blockList中。因为blockList中已经存在(0xc,0x3e),故未对blockList造成任何影响。同时将指令基本块入口地址e设置为0x3f。
e、递增i=i+1,跳转到步骤b。
…(根据第i条指令的类型,不断重复上述步骤b、c、d、e直至i+1小于指令序列的长度)。
e、当i+1小于指令序列的长度时,算法停止。输出blockList=[(0x0,0xb),(0xc,0x3e),(0x3f,0x43),(0x44,0x4a),(0x4b,0x4e),(0x4f,0x55),(0x56,0x57),(0x58,0x5d),(0x5e,0x66),(0x67,0x6e),(0x6f,0x74),(0x67,0x6e),(0x6f,0x74),(0x75,0x79),(0x7a,0x83),(0x84,0x85),(0x86,0x91),(0x92,0x94),(0x95,0xa2)],包含上述示例中17个指令基本块的每个指令基本块的入口地址和出口地址。
本申请实施例提供了一种智能合约的控制流图生成方法,根据所述跳转记录和所述函数记录,生成各节点之间的有向边,包括:
根据所述函数记录中的函数入口信息,得到任一函数入口地址对应的第一节点;
通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点;继续确定所述第二节点指向的第三节点,直至所述函数入口信息中的各函数入口地址对应的节点均遍历结束。在一种示例中,利用entryList、cList、jumpTargets和blockList,构建控制流图的方法流程包括:
a.初始化控制流图G,将blockList数组中的所有元素(e,s)视为控制流图上的节点。此时控制流图G是只有节点没有边的空图。
b.从entryList中取出一个函数入口e,在blockList中寻找指令基本块入口地址为e的指令基本块block,若寻找成功,将该block加入队列Q中,跳至步骤c。否则从entryList取另一个函数入口重复步骤b。若entryList中的所有函数入口都已经处理完,则输出控制流图G。
c.若队列Q不为空,从队列Q中取出指令基本块block’,跳至步骤d。若队列Q为空,跳转至步骤b。若指令基本块block’的出口地址s的指令是JUMP/JUMPI则跳转到步骤e;是REVERT、SELFDESTRUCT指令,则跳转至步骤c;否则调整至步骤e。
d.将指令基本块的入口地址为s+1的指令基本块b_next’加入到Q中,在控制流图G中添加block’到b_next’的有向边,跳转至步骤c处理。
e.若block’的出口地址s的指令是JUMP/JUMPI,则从jumpTargets中寻找其跳转地址。若寻找成功,在控制流图G中为block’添加到以跳转地址为入口的指令基本块b_next的有向边,并将b_next添加到队列Q中(若为JUMPI指令,还需将入口地址为s+1的指令基本块b_next’加入到Q中,在控制流图G中添加block’到b_next’的有向边),跳 至步骤c处理。若寻找失败,进入步骤f处理。
以block’的出口地址s,从cList中查询JUMP/JUMPI是否为函数调用,若寻找成功,则在控制流图G中为block’添加到以对应函数返回地址ret为入口的基本块b_next的有向边,并将b_next添加到队列Q中。最后跳至步骤c处理。
本申请实施例提供了一种智能合约的控制流图生成方法,通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点,包括:
若所述第一节点的块出口地址在所述跳转记录对应有跳转地址,则生成所述第一节点指向所述跳转地址对应的第二节点的有向边;若所述第一节点的指令基本块中块出口地址在所述函数记录的调用信息对应有返回地址,则生成所述第一节点指向所述返回地址对应的第二节点的有向边;
否则,生成所述第一节点指向按照所述顺序信息位于所述第一节点之后的第二节点。在上述示例中,根据该17个指令基本块的entryList、cList、jumpTargets和blockList构建控制流程图的方法包括:
a.以blockList数组中的所有元素为节点初始化控制流图,此时控制流图只有节点,没有任何边。
b.从entryList中取出函数入口0x58,在blockList中寻找入口为0x58的基本块(0x58,0x5d),将(0x58,0x5d)加入到队列Q中,跳到步骤c。
c.队列Q不为空,从队列Q中取出基本块(0x58,0x5d),因为0x5d的指令是POP,跳转到步骤d。
d.将下个基本块(0x5e,0x66)加入到Q中,在控制流图中添加(0x58,0x5d)到(0x5e,0x66)的有向边,跳转至步骤c处理。
c.队列Q不为空,从队列Q中取出基本块(0x5e,0x66),因为0x66的指令是JUMPI,跳转到步骤e处理。
e.因为在jumptTarget中存在地址为0x66(对应JUMPI指令)和跳转地址0x92,将入口为0x92的指令基本块(0x92,0x94)加入到Q中,添加(0x5e,0x66)到(0x92,0x94)的有向边。此外,将入口地址为0x66+1=0x67的基本块(0x67,0x6e)加入到Q中,添加(0x5e,0x66)到(0x67,0x6e)的有向边。跳转至步骤c处理。
c.队列Q不为空,从队列Q中取出基本块(0x92,0x94),因为0x94的指令是JUMP,跳转到步骤e处理。
e.因为在jumptTarget中不存在地址0x94的跳转,跳至步骤f处理。
f.因为从cList中不存在地址为0x94的函数调用,跳至c处理。
c.队列Q不为空,从队列Q中取出指令基本块(0x67,0x6e),因为0x6e的指令是JUMPI,跳转到步骤e处理。
e.因为在jumptTarget中存在地址0x6e(JUMPI指令)和对应跳转地址0x7a,将入口为0x7a的基本块(0x7a,0x83)加入到Q中,添加(0x67,0x6e)到(0x7a,0x83)的有向边。此外,将入口地址为0x6e+1=0x6f的基本块(0x6f,0x74)加入到Q中,添加(0x67,0x6e)到(0x6f,0x74)的有向边。跳转至步骤c处理。
c.队列Q不为空,从队列Q中取出指令基本块(0x7a,0x83),因为0x83 的指令是JUMP,跳转到步骤e处理。
e.因为在jumptTarget中不存在地址0x83(JUMP指令)的跳转,跳至步骤f处理。
f.因为从cList中存在地址为0x83的函数调用(0x83,0x84),寻找入口地址为0x84的指令基本块(0x84,0x85)。在控制流图添加(0x7a,0x83)到(0x84,0x85)的有向边,并将(0x84,0x85)添加到Q中,跳转至步骤c处理。
c.队列Q不为空,从队列Q中取出指令基本块(0x6f,0x74),因为0x74的指令是JUMP,跳转到步骤e处理。
e.因为在jumptTarget中不存在地址为0x74(JUMP指令)的跳转,跳至步骤f处理。
f.因为从cList中存在地址为0x74的函数调用(0x74,0x75),寻找入口地址为0x75的指令基本块(0x75,0x79)。在控制流图添加(0x6f,0x74)到(0x75,0x79)的有向边,并将(0x75,0x79)添加到Q中,跳转至步骤c处理。
c.队列Q不为空,从队列Q中取出指令基本块(0x84,0x85),因为0x85的指令是POP,跳转到步骤d处理。
d.将下个指令基本块(0x86,0x91)加入到Q中,在控制流图中添加(0x84,0x85)到(0x86,0x91)的有向边,跳转至步骤c处理。
c.队列Q不为空,从队列Q中取出指令基本块(0x75,0x79),因为0x79的指令是JUMP,跳转到步骤e处理。
e.因为在jumptTarget中存在地址为0x79(JUMPI指令)和对应跳转地址0x86,因为(0x86,0x91)已经加入过Q中,故不再重复加入,添加(0x75,0x79)到(0x86,0x91)的有向边,跳至c。
c.队列Q不为空,从队列Q中取出指令基本块(0x86,0x91),因为0x91的指令是JUMP,跳转到步骤e处理。
e.因为在jumptTarget中存在地址为0x91(JUMPI指令)和对应跳转地址0x5e,因为(0x5e,0x66)已经加入过Q中,故不再重复加入,添加(0x86,0x91)到(0x5e,0x66)的有向边,跳至步骤c。
c.队列Q为空,跳转至步骤b处理。
b.从entryList取出函数入口0x95,在blockList中寻找入口为0x95的基本块(0x95,0xa2)。将(0x95,0xa2)加入到队列Q中,跳到步骤c。
c.队列Q不为空,从队列Q中取出指令基本块(0x95,0xa2),因为0xa2的指令是JUMP,跳转到步骤e处理。
e.因为在jumptTarget中不存在地址为0x95的跳转,跳至步骤f。
f.因为从cList中不存在地址为0x95的函数调用,跳至c。
c.队列Q为空,跳转至步骤b处理。
b.entryList已经为空,结束算法。
通过以上方法步骤获取该智能合约的控制流程图,如图7所示。
基于以上方法流程,本申请实施例提供了一种智能合约的控制流图生成方法的流程,如图8所示,包括:
步骤801、获取智能合约。
步骤802、将智能合约转译为以太坊虚拟机字节码,并获取该智能合约的跳转记录和函数记录。
步骤803、将该以太坊虚拟机字节码转换为指令序列。
步骤804、将该指令序列划分得到指令基本块和各指令基本块的顺序信息。
步骤805、以指令基本块为节点,根据跳转记录、函数记录生成各节点之间的有向边,得到智能合约的控制流图。
基于同样的构思,本发明实施例提供一种智能合约的控制流图生成装置,图9为本申请实施例提供的一种智能合约的控制流图生成装置示意图,如图9示,包括:
转译和记录模块901,用于在将智能合约转译为以太坊虚拟机字节码过程中,针对所述智能合约中的跳转指令生成跳转记录,且针对所述智能合约中的函数指令生成函数记录;所述跳转记录用于表征跳转指令的跳转地址;所述函数记录用于表征函数指令的边界地址;
构建模块902,用于通过反汇编工具将所述以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将所述以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;
所述构建模块902还用于,将每个指令基本块作为控制流图中的节点,根据所述跳转记录、所述函数记录和所述顺序信息,生成各节点之间的有向边,得到所述智能合约的控制流图。
可选的,所述转译和记录模块901具体用于:针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,跳转信息包括跳转指令所在的地址及跳转指令对应的跳转地址。
可选的,所述转译和记录模块901具体用于:针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,包括:针对所述控制指令中的循环指令,生成所述循环指令的跳转开始地址和跳转结束地址;在转译至JUMPI跳转指令时,将JUMPI跳转指令所在的地址及跳转结束地址作为JUMPI跳转指令的跳转信息;在转移至最末尾的JUMP跳转指令时,将JUMP跳转指令所在的地址及跳转开始地址作为JUMPI跳转指令的跳转信息;针对所述控制指令中的条件语句/三元表达式,生成FALSE跳转地址和END跳转地址;在转译至条件判断语句对应的JUMPI跳转指令,将JUMPI跳转指令所在的地址及FALSE跳转地址作为JUMPI跳转指令的跳转信息;在转译至FALSE内容后的JUMP跳转指令,将JUMP跳转指令所在的地址及END跳转地址作为JUMP跳转指令的跳转信息。
可选的,所述转译和记录模块901具体用于:在转译到所述智能合约中的函数选择器时,将所述函数选择器中的公共函数的函数入口地址记录至函数记录中的函数入口信息,所述公共函数的函数入口地址为参数加载转译后的首个JUMPDEST指令的地址;在转译到所述智能合约中的函数体时,将所述函数体中的私有函数的函数入口地址记录至所述函数入口信息;所述私有 函数的函数入口地址为调用地址的首个参数的地址;在转译到所述函数体中的函数调用语句时,将所述函数调用语句的调用地址和返回地址记录至函数记录中的调用信息;所述调用地址为函数调用语句对应的JUMP指令的地址;所述返回地址为函数调用语句对应的JUMPDEST指令的地址。
可选的,所述构建模块902具体用于,设置指令基本块的块入口地址为e;从所述以太坊虚拟机指令序列中依次获取指令,若第i条指令为JUMPDEST指令,则将第i-1条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i条指令为的地址;若第i条指令为JUMP/JUMP1跳转指令、REVERT指令、SELFDESTRUCT指令,则将第i条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i+1条指令的地址。
可选的,所述构建模块902具体用于,根据所述函数记录中的函数入口信息,得到任一函数入口地址对应的第一节点;通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点;继续确定所述第二节点指向的第三节点,直至所述函数入口信息中的各函数入口地址对应的节点均遍历结束。
可选的,所述构建模块902具体用于,若所述第一节点的块出口地址在所述跳转记录对应有跳转地址,则生成所述第一节点指向所述跳转地址对应的第二节点的有向边;若所述第一节点的指令基本块中块出口地址在所述函数记录的调用信息对应有返回地址,则生成所述第一节点指向所述返回地址对应的第二节点的有向边;否则,生成所述第一节点指向按照所述顺序信息位于所述第一节点之后的第二节点。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (10)

  1. 一种智能合约的控制流图生成方法,其特征在于,所述方法包括:
    在将智能合约转译为以太坊虚拟机字节码过程中,针对所述智能合约中的跳转指令生成跳转记录,且针对所述智能合约中的函数指令生成函数记录;所述跳转记录用于表征跳转指令的跳转地址;所述函数记录用于表征函数指令的边界地址;
    通过反汇编工具将所述以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将所述以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;
    将每个指令基本块作为控制流图中的节点,根据所述跳转记录、所述函数记录和所述顺序信息,生成各节点之间的有向边,得到所述智能合约的控制流图。
  2. 如权利要求1中所述的方法,其特征在于,针对所述智能合约中的跳转指令生成跳转记录,包括:
    针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,跳转信息包括跳转指令所在的地址及跳转指令对应的跳转地址。
  3. 如权利要求2中所述的方法,其特征在于,针对所述智能合约中的控制指令,生成所述控制指令对应的跳转地址;针对所述控制指令中的每个JUMP/JUMPI跳转指令,生成跳转指令对应的跳转信息,包括:
    针对所述控制指令中的循环指令,生成所述循环指令的跳转开始地址和跳转结束地址;在转译至JUMPI跳转指令时,将JUMPI跳转指令所在的地址及跳转结束地址作为JUMPI跳转指令的跳转信息;在转移至最末尾的JUMP跳转指令时,将JUMP跳转指令所在的地址及跳转开始地址作为JUMPI跳转指令的跳转信息;
    针对所述控制指令中的条件语句/三元表达式,生成FALSE跳转地址和END跳转地址;在转译至条件判断语句对应的JUMPI跳转指令,将JUMPI跳转指令所在的地址及FALSE跳转地址作为JUMPI跳转指令的跳转信息;在转译至FALSE内容后的JUMP跳转指令,将JUMP跳转指令所在的地址及END跳转地址作为JUMP跳转指令的跳转信息。
  4. 如权利要求1中所述的方法,其特征在于,针对所述智能合约中的函数指令生成函数记录,包括:
    在转译到所述智能合约中的函数选择器时,将所述函数选择器中的公共函数的函数入口地址记录至函数记录中的函数入口信息,所述公共函数的函数入口地址为参数加载转译后的首个JUMPDEST指令的地址;
    在转译到所述智能合约中的函数体时,将所述函数体中的私有函数的函数入口地址记录至所述函数入口信息;所述私有函数的函数入口地址为调用地址的首个参数的地址;
    在转译到所述函数体中的函数调用语句时,将所述函数调用语句的调用地址和返回地址记录至函数记录中的调用信息;所述调用地址为函数调用语句对应的JUMP指令的地址;所述返回地址为函数调用语句对应的JUMPDEST指令的地址。
  5. 如权利要求1中所述的方法,其特征在于,将所述以太坊虚拟机指令序列进行划分得到有序的多个指令基本块,包括:
    设置指令基本块的块入口地址为e;
    从所述以太坊虚拟机指令序列中依次获取指令,若第i条指令为JUMPDEST指令,则将第i-1条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i条指令为的地址;
    若第i条指令为JUMP/JUMP1跳转指令、REVERT指令、SELFDESTRUCT指令,则将第i条指令的地址s确定为指令基本块的块出口地址,将块入口地址e至块出口地址s的各指令作为一个指令基本块并更新指令基本块的块入口地址e为第i+1条指令的地址。
  6. 如权利要求1中所述的方法,其特征在于,根据所述跳转记录和所述函数记录,生成各节点之间的有向边,包括:
    根据所述函数记录中的函数入口信息,得到任一函数入口地址对应的第一节点;
    通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点;继续确定所述第二节点指向的第三节点,直至所述函数入口信息中的各函数入口地址对应的节点均遍历结束。
  7. 如权利要求6中所述的方法,其特征在于,通过所述第一节点的块出口地址是否在所述跳转记录对应有跳转地址或所述函数记录的调用信息对应有返回地址,确定所述第一节点指向的第二节点,包括:
    若所述第一节点的块出口地址在所述跳转记录对应有跳转地址,则生成所述第一节点指向所述跳转地址对应的第二节点的有向边;若所述第一节点的指令基本块中块出口地址在所述函数记录的调用信息对应有返回地址,则生成所述第一节点指向所述返回地址对应的第二节点的有向边;
    否则,生成所述第一节点指向按照所述顺序信息位于所述第一节点之后的第二节点。
  8. 一种智能合约的控制流图生成装置,其特征在于,所述装置包括:
    转译和记录模块,用于在将智能合约转译为以太坊虚拟机字节码过程中,针对所述智能合约中的跳转指令生成跳转记录,且针对所述智能合约中的函数指令生成函数记录;所述跳转记录用于表征跳转指令的跳转地址;所述函数记录用于表征函数指令的边界地址;
    构建模块,用于通过反汇编工具将所述以太坊虚拟机字节码转换为以太坊虚拟机指令序列,并将所述以太坊虚拟机指令序列进行划分得到具有顺序信息的多个指令基本块;
    所述构建模块还用于,将每个指令基本块作为控制流图中的节点,根据所述跳转记录、所述函数记录和所述顺序信息,生成各节点之间的有向边,得到所述智能合约的控制流图。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有程序,当所述程序在计算机上运行时,使得计算机实现执行权利要求1至7中任一项所述的方法。
  10. 一种计算机设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于调用所述存储器中存储的计算机程序,按照获得的程序执行如权利要求1至7任一权利要求所述的方法。
PCT/CN2022/131537 2021-12-24 2022-11-11 一种智能合约的控制流图生成方法及装置 WO2023116256A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111598801.9A CN114385185A (zh) 2021-12-24 2021-12-24 一种智能合约的控制流图生成方法及装置
CN202111598801.9 2021-12-24

Publications (1)

Publication Number Publication Date
WO2023116256A1 true WO2023116256A1 (zh) 2023-06-29

Family

ID=81197085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131537 WO2023116256A1 (zh) 2021-12-24 2022-11-11 一种智能合约的控制流图生成方法及装置

Country Status (2)

Country Link
CN (1) CN114385185A (zh)
WO (1) WO2023116256A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385185A (zh) * 2021-12-24 2022-04-22 深圳前海微众银行股份有限公司 一种智能合约的控制流图生成方法及装置
CN115879868B (zh) * 2022-09-09 2023-07-21 南京审计大学 一种专家系统与深度学习相融合的智能合约安全审计方法
CN116820405B (zh) * 2023-08-31 2023-12-01 浙江大学 一种基于复用分析的evm字节码控制流图构建方法
CN117270878B (zh) * 2023-11-22 2024-02-09 常熟理工学院 程序执行路径中程序变量的约束条件提取方法及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117155A1 (en) * 2014-10-24 2016-04-28 Thomson Licensing Control flow graph flattening device and method
CN110175454A (zh) * 2019-04-19 2019-08-27 肖银皓 一种基于人工智能的智能合约安全漏洞挖掘方法及系统
CN111062038A (zh) * 2019-11-23 2020-04-24 同济大学 一种基于状态空间的智能合约形式化验证系统及方法
CN111563237A (zh) * 2020-03-24 2020-08-21 博雅正链(北京)科技有限公司 一种智能合约安全增强方法
CN113312088A (zh) * 2021-06-29 2021-08-27 北京熵核科技有限公司 一种程序指令的执行方法及装置
CN114385185A (zh) * 2021-12-24 2022-04-22 深圳前海微众银行股份有限公司 一种智能合约的控制流图生成方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117155A1 (en) * 2014-10-24 2016-04-28 Thomson Licensing Control flow graph flattening device and method
CN110175454A (zh) * 2019-04-19 2019-08-27 肖银皓 一种基于人工智能的智能合约安全漏洞挖掘方法及系统
CN111062038A (zh) * 2019-11-23 2020-04-24 同济大学 一种基于状态空间的智能合约形式化验证系统及方法
CN111563237A (zh) * 2020-03-24 2020-08-21 博雅正链(北京)科技有限公司 一种智能合约安全增强方法
CN113312088A (zh) * 2021-06-29 2021-08-27 北京熵核科技有限公司 一种程序指令的执行方法及装置
CN114385185A (zh) * 2021-12-24 2022-04-22 深圳前海微众银行股份有限公司 一种智能合约的控制流图生成方法及装置

Also Published As

Publication number Publication date
CN114385185A (zh) 2022-04-22

Similar Documents

Publication Publication Date Title
WO2023116256A1 (zh) 一种智能合约的控制流图生成方法及装置
US8769511B2 (en) Dynamic incremental compiler and method
KR101731752B1 (ko) 결합된 분기 타깃 및 프레디킷 예측
Lee et al. On-the-fly pipeline parallelism
US6128775A (en) Method, system, and computer program product for performing register promotion via load and store placement optimization within an optimizing compiler
JP5530449B2 (ja) モジュラフォレストオートマトン
JP7394211B2 (ja) スマートコントラクトの並行実行の方法、装置、機器、及び媒体
US11249758B2 (en) Conditional branch frame barrier
US20130283250A1 (en) Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces
US7174546B2 (en) Compiler and register allocation method
CN109643260A (zh) 使用分析加速器的数据流分析处理的资源高效加速
WO2008112422A1 (en) System and method for scalable flow and context-sensitive pointer alias analysis
JP5178852B2 (ja) 情報処理装置およびプログラム
US8458679B2 (en) May-constant propagation
Ahmad et al. Leveraging parallel data processing frameworks with verified lifting
Pham et al. Towards systematic and dynamic task allocation for collaborative parallel fuzzing
Ghica et al. High-level effect handlers in C++
Negrini et al. LiSA: A generic framework for multilanguage static analysis
Phipps-Costin et al. Continuing WebAssembly with Effect Handlers
Aguilar et al. Parallelism extraction in embedded software for Android devices
US20160371066A1 (en) Computer that performs compiling, compiling method and storage medium that stores compiler program
Kargén et al. Speeding up bug finding using focused fuzzing
EP4083785B1 (en) Profiling and optimization of compiler-generated code
Facchinetti et al. Higher-order demand-driven program analysis
Sanjel et al. Partitionable programs using tyro v2

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909573

Country of ref document: EP

Kind code of ref document: A1