WO2021095188A1 - Obfuscation device, obfuscation method, and recording medium - Google Patents

Obfuscation device, obfuscation method, and recording medium Download PDF

Info

Publication number
WO2021095188A1
WO2021095188A1 PCT/JP2019/044620 JP2019044620W WO2021095188A1 WO 2021095188 A1 WO2021095188 A1 WO 2021095188A1 JP 2019044620 W JP2019044620 W JP 2019044620W WO 2021095188 A1 WO2021095188 A1 WO 2021095188A1
Authority
WO
WIPO (PCT)
Prior art keywords
binary code
value
obfuscation
loss function
detectability
Prior art date
Application number
PCT/JP2019/044620
Other languages
French (fr)
Japanese (ja)
Inventor
拓磨 天田
センペイ リュウ
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2019/044620 priority Critical patent/WO2021095188A1/en
Publication of WO2021095188A1 publication Critical patent/WO2021095188A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Definitions

  • the present invention relates to an obfuscation device, an obfuscation method, and a recording medium.
  • Patent Document 1 describes that a conditional branch having different truths depending on a parameter is created by using an encryption method in which the probability of successful decryption is variable according to the parameter.
  • a conditional branch that can always be regarded as true is created, and in this case, a dummy process is assigned to the NO side of the conditional branch.
  • a conditional branch that can always be regarded as false is created, and in this case, a dummy process is assigned to the YES side of the conditional branch.
  • a conditional branch whose truth is indefinite is created, and in this case, equivalent processing with different descriptions is assigned to both of the conditional branches.
  • An example of an object of the present invention is to provide an obfuscation device, an obfuscation method, and a recording medium capable of solving the above problems.
  • the obfuscation device can detect the value of the part to be replaced in the binary code by using a loss function indicating the detectability of a predetermined pattern in the binary code. It is provided with a replacement part that replaces the property with a value that becomes smaller than a predetermined condition.
  • the obfuscation method uses a loss function indicating the detectability of a predetermined pattern in the binary code to detect the value of the part to be replaced in the binary code. It includes a step of replacing the possibility with a value that becomes smaller than a predetermined condition.
  • the recording medium uses a loss function indicating the detectability of a predetermined pattern in the binary code on a computer to obtain the value of the portion to be replaced in the binary code. It is a recording medium on which a program for executing a step of replacing the detectability with a value smaller than a predetermined condition is recorded.
  • the binary code can be obfuscated without the need to rewrite the address.
  • FIG. 1 is a schematic block diagram showing an example of the functional configuration of the obfuscation device according to the embodiment.
  • the obfuscation device 100 includes an acquisition unit 110, an output unit 120, a storage unit 180, and a control unit 190.
  • the control unit 190 includes a division unit 191, a replacement target detection unit 192, a loss function acquisition unit 193, and a replacement unit 194.
  • the obfuscation device 100 obfuscates the binary code.
  • the obfuscation device 100 performs a process of reducing the accuracy of estimating a specific pattern in the binary code by using the neural network, such as estimating the start address of the function by the neural network (NN).
  • the neural network such as estimating the start address of the function by the neural network (NN).
  • the binary code here is an executable program. It is called binary code because it is understood that the program is represented by a binary number (bit string).
  • Binary code can be obtained, for example, by compiling the source code (a program written in a high-level language) into binary code. Alternatively, the binary code can be obtained by assembling the assemble code (Assemble Code, a program described in assembly language).
  • One of the methods for detecting the start address of a function included in the binary code is a method using a neural network such as deep learning.
  • a method using a recurrent neural network (RNN) has shown relatively high performance.
  • the obfuscation device 100 performs a process of reducing the accuracy with which the neural network detects a specific pattern (for example, the start address of the function) in the binary code. This can make the analysis of binary code relatively difficult.
  • FIG. 2 is a diagram showing an example of input / output of a target neural network in which the obfuscation device 100 reduces the pattern detection accuracy.
  • the neural network 900 shown in FIG. 2 accepts binary code input in byte units. “Byte 0”, “byte 1”, “byte 2”, ... In FIG. 2 indicate the value of each byte in order from the beginning of the binary code.
  • the neural network 900 accepts, for example, input of data in which a binary code is converted into a one-hot vector for each byte.
  • the one-hot vectorization of 1-byte data is shown by Eq. (1).
  • the "x" on the left side of the arrow in equation (1) indicates one byte of data in the binary code.
  • “ ⁇ 0,1 ⁇ ” indicates one bit that takes a value of 0 or 1.
  • “ ⁇ 0,1 ⁇ 8 ” indicates 8-bit data. When 1-byte data is expressed in decimal, it takes an integer value of 0 to 255.
  • the "x” with an arrow on the right side of the arrow in equation (1) indicates a one-hot vector.
  • the arrow above the "x” is attached to clearly indicate that it is a vector.
  • the “x” with an arrow is also expressed as a vector x (for example, a one-hot vector x) or simply x.
  • the one-hot vector x is represented by a vertical vector of 256 bits from b 0 to b 255.
  • the value of any one of the 256 bits from b 0 to b 255 is "1"
  • the value of the other 255 bits is "0”. ..
  • the byte value is shown in one-hot representation. That is, 1-byte value i of data (i is an integer of 0 ⁇ i ⁇ 255) if it is, the bit b i value is "1", the other bit value is "0".
  • the neural network 900 that receives the input of the binary code performs binary classification for each byte of the binary code whether or not the byte is the head of the function, and outputs the classification result. For example, the neural network 900 outputs a value "1" for a byte estimated to be the beginning of a function, and outputs a value "0" for a byte estimated not to be the beginning of a function.
  • Each of "R0", “R1", “R2”, ... In FIG. 2 takes a value of "1” or "0” depending on the estimation result of whether or not it is the beginning of the function.
  • the position at the beginning of the function is also called the start address of the function.
  • the position at the end of the function is also called the end address of the function.
  • the estimation target by the neural network 900 is not limited to the position at the beginning of the function, and can be the position of various patterns that can be detected in the obfuscated binary data.
  • the obfuscation device 100 partially rewrites the binary code so that the accuracy of the output of the neural network 900 becomes lower. At that time, the instruction itself to be executed should not be rewritten so that the computer executing the binary code does not behave unexpectedly. Partial rewriting of binary code is also referred to as replacing the value of that part.
  • a method of rewriting a series of instructions included in the binary code into a series of instructions that perform equivalent processing can be considered.
  • the computer executing the binary code may behave unexpectedly due to a bug in the rewritten series of instructions.
  • the address of the part after the rewrite target part is carried down, the jump destination address of the jump instruction, etc. It becomes necessary to rewrite the address, which increases the load of obfuscation processing.
  • a method of rewriting the binary code a method of inserting unexecuted data between the functions can be considered. However, in this method, when the address of the portion after the data insertion portion is carried down, it becomes necessary to rewrite the address such as the jump destination address of the jump instruction, which increases the load of the obfuscation process.
  • the obfuscation device 100 obfuscates the binary code to be obfuscated by rewriting the value of the portion that is not referenced. References here include references for execution. Therefore, the part that is not referred to here is a part that is neither executed nor referred to.
  • One of the parts where the obfuscation device 100 rewrites the value is padding between the functions.
  • the padding here is a non-referenced part provided for alignment, for example, the start address of a function is set to the start address of a block every 8 bytes.
  • FIG. 3 is a diagram showing an example of padding that the obfuscation device 100 targets for rewriting the value.
  • the function func1 starting from line number 1 ends at line number 6, and processing is transferred to another by the jump instruction of line number 6. Therefore, the nop of line number 7 (line L11) located between the end of function func1 (line number 6) and the beginning of function func2 (line number 9) and the nop of line number 8 (line L12) Neither is executed.
  • These lines L11 and L12 correspond to the padding example.
  • the obfuscation device 100 rewrites, for example, the value of line L11 and the value of line L12.
  • the obfuscation device 100 rewrites the existing padding value instead of inserting new data, so that the address of the code is not changed after that. In this respect, the obfuscation device 100 does not need to rewrite the address, and the load of the obfuscation process can be relatively small.
  • FIG. 4 is a diagram showing an example of a command to be moved by the obfuscation device 100.
  • the move instruction on line L21 has a length of 5 bytes in binary code.
  • the jump instruction is 4 bytes in binary code, and the move instruction has a longer byte length than the jump instruction.
  • the line L21 is not referred to by another part, and the process does not jump from the other part to the line L21.
  • the move instruction on line L21 does not include an address as an argument, the move instruction can be moved as it is without being rewritten. Therefore, the obfuscation device 100 selects the binary code of line L21 as the movement target.
  • the obfuscation device 100 has a longer byte length than the jump instruction, is neither a reference target from another part nor a jump destination from another part, and is not rewritten. Select the command that can be moved to as the movement target.
  • FIG. 4 shows an example in which the obfuscation device 100 selects one instruction as the movement target, the obfuscation device 100 selects a series of a plurality of instructions as the movement target. May be good. In this case, the total byte length of a series of plurality of instructions is longer than the byte length of the jump instruction, and any of the plurality of instructions can be referred to by another part and the jump destination from the other part. It suffices if it can be moved without rewriting.
  • FIG. 5 is a diagram showing an example of instruction movement by the obfuscation device 100.
  • FIG. 5 shows an example in which the obfuscation device 100 moves the row L21 of FIG. 4 to the free area row L35.
  • the free area referred to here is an unused area of the memory area of the storage unit 180 that can be used for the binary code.
  • the free area before rewriting is an area that is not referenced like padding.
  • the obfuscation device 100 moves the move instruction in line L21 of FIG. 4 to line L35 as it is without rewriting. Further, the obfuscation device 100 provides a jump instruction to the line L35, which is the destination of the move instruction, in the first line L32 of the lines L32 and L33, which is the source of the move instruction. Further, the obfuscation device 100 provides a jump instruction to the line L34 immediately after the movement source of the move instruction in the line L36 immediately after the line L35 to which the move instruction is moved.
  • the computer that executes the code of FIG. 5 has an add instruction (line L31), a jump instruction (line L32), a move instruction (line L35), a jump instruction (line L36), and an add instruction (line L34) in this order.
  • the computer executes the same instructions in the same order as in the code of FIG. 4 according to the code of FIG. 5, except that the processing jumps at each of the jump instruction of line L32 and the jump instruction of line L36.
  • the row L33 is a region that is not referred to.
  • the obfuscation device 100 rewrites the value of line L33 for obfuscation of the binary code. In this way, the obfuscation device 100 moves the command satisfying the above-mentioned conditions to the free area, maintains the execution order of the commands by the jump instruction, and rewrites the value for obfuscation of the binary code. Can be obtained.
  • the acquisition unit 110 acquires the obfuscated binary code.
  • the acquisition unit 110 acquires information regarding the binary code to be obfuscated. Specifically, the acquisition unit 110 acquires information capable of grasping an instruction in the obfuscated binary code and an area that is not referred to when the obfuscated binary code is executed. In the following, a case where the acquisition unit 110 acquires the assemble code corresponding to the obfuscated binary code will be described as an example.
  • the assemble code corresponding to the binary code referred to here is an assemble code indicating the information of the binary code.
  • the assemble code corresponding to the binary code may be the assemble code of the source (source to be assembled) of the binary code.
  • the assemble code corresponding to the binary code may be an assemble code obtained by disassembling the binary code.
  • the method by which the acquisition unit 110 acquires the binary code and the assemble code is not limited to a specific method.
  • the acquisition unit 110 may have a communication function and receive the binary code and the assembly code from another device.
  • the acquisition unit 110 may compile the source code into assembly code and further assemble it into binary code.
  • the output unit 120 outputs the obfuscated binary code.
  • the method by which the output unit 120 outputs the binary code is not limited to a specific method.
  • the output unit 120 may have a communication function and transmit a binary code to another device.
  • the output unit 120 may write the binary code to an external memory (a storage device that can be attached to and detached from the obfuscation device 100).
  • the storage unit 180 stores various data.
  • the storage unit 180 is configured by using the storage device included in the obfuscation device 100.
  • the control unit 190 controls each unit of the obfuscation device 100 to execute various processes.
  • the function of the control unit 190 is executed by the CPU (Central Processing Unit) included in the obfuscation device 100 reading a program from the storage unit 180 and executing the program.
  • CPU Central Processing Unit
  • the division unit 191 divides the obfuscated binary code into subsequences of a predetermined length.
  • the obfuscation device 100 can obfuscate an arbitrary length binary code by performing obfuscation processing for each subsequence.
  • the length of the subsequence that the division unit 191 divides the obfuscated binary code into is not limited to a specific length. If the remaining length of the binary code is shorter than the length of the subsequence, for example, the divider 191 adds data such as a bit string with a value of "0" to the rest of the binary code to make it the length of the subsequence. It may be.
  • the replacement target detection unit 192 detects the replacement target portion of the obfuscated binary code. For example, the replacement target detection unit 192 detects the padding illustrated in rows L11 and L12 of FIG. 3 as the replacement target portion. Further, the replacement target detection unit 192 detects the instruction to be moved, which is exemplified in the line L21 of FIG. The replacement target detection unit 192 may detect one instruction or a series of instructions.
  • the loss function acquisition unit 193 acquires the loss function.
  • the loss function referred to here is the estimation result of the position of a predetermined pattern in the binary code when the binary code is converted into a one-hot vector and input to the neural network, and the correct label of the position of the predetermined pattern in the binary code. It is a function showing the correlation of.
  • the loss function a function for calculating an error obtained by subtracting the value of the correct label from the estimation result of the neural network may be used.
  • Subsequence X is expressed as in Eq. (2).
  • f (X) may output a vector whose elements take a value of either "0" or "1", such as the outputs R0, R1, ... Of the neural network 900 of FIG. ..
  • f (X) may output a value indicating the magnitude of the possibility that the byte is the beginning of the function for each byte of the binary code input to the neural network 900.
  • "R0", “R1", “R2”, ... In FIG. 2 correspond to the magnitude of the possibility that each byte is the beginning of the function in order from the beginning of the binary code.
  • f (X) may output the probability as a value indicating the magnitude of the possibility, but the present invention is not limited to this.
  • the loss function Loss is expressed as Loss (f (X), Y) using f (X) and Y. It is assumed that the loss function acquisition unit 193 is known about the function f and can calculate the loss function Loss (f (X), Y).
  • the replacement unit 194 rewrites the replacement target portion detected by the replacement target detection unit 192.
  • the replacement target detection unit 192 detects padding as the replacement target portion
  • the replacement unit 194 rewrites the padding. That is, the replacement unit 194 updates the padding byte value.
  • the byte whose value is updated by the replacement unit 194 is also referred to as a junk byte.
  • the replacement unit 194 refers to FIGS. 4 and 5 with respect to the movement target command detected by the replacement target detection unit 192. Move the instructions and place the jump instructions as described. As a result, the replacement target detection unit 192 provides the area to be rewritten illustrated in line L33 of FIG. Then, the replacement target detection unit 192 rewrites the area to be rewritten.
  • the replacement unit 194 uses the loss function acquired by the loss function acquisition unit 193 to set the value of the replacement target part of the obfuscated binary code to be less than or equal to the predetermined condition that the neural network can detect a predetermined pattern. Replace with a smaller value.
  • the replacement unit 194 may perform a full solution search for all the bytes to be replaced and rewrite the value to the value that minimizes the detectability indicated by the loss function.
  • the replacement unit 194 calculates the loss function value for all combinations of 256 values of each byte for all the bytes to be replaced. Then, the replacement unit 194 adopts the combination having the smallest loss function value, and replaces the value of each byte to be replaced with the value in the adopted combination.
  • the solution with the lowest estimation accuracy by the neural network can be obtained.
  • the amount of calculation is the power of the number of bytes to be replaced. Therefore, if the number of bytes to be replaced is large, it may not be possible to obtain a solution within a realistic time. is there.
  • the replacement unit 194 may determine the value to be written in the region to be rewritten based on the slope obtained by partially differentiating the loss function for each bit of the one-hot vector. Specifically, the value obtained by converting the one-hot vector showing the slope at which the loss function is the largest into binary data is input (written) to the part to be replaced.
  • the arrowed x i * (i is an integer of 0 ⁇ i ⁇ s) indicates a one-hot representation of the bytes of the subsequence X * after obfuscation, and is expressed as in equation (4).
  • b i (i is an integer of 0 ⁇ i ⁇ 255) indicates a bit one-hot vector indicating 1 byte. Therefore, ⁇ Loss / ⁇ b 0 , ⁇ , ⁇ Loss / ⁇ b 0 indicates the partial differential of the loss function Loss by each bit of the one-hot vector.
  • argmax indicates one-hot vectorization in which the value of the element having the maximum value among the elements of the vector is set to "1" and the value of the other element is set to "0".
  • An example of one-hot vectorization by argmax is shown in equation (5).
  • the value of the loss function Loss is set. It is replaced with the one-hot vector obtained by applying argmax to the value differentiated by each element of the one-hot vector.
  • the one-hot vector x i does not correspond to the bytes to be replaced, i.e., one-hot when the vector x i is not a one-hot representation of bytes to be replaced, as it is one-hot one-hot vector x i before obfuscation Let the vector x i * be.
  • the replacement unit 194 rewrites the byte to be replaced based on the gradient of the loss function Loss to obfuscate it, so that the estimation accuracy by the neural network can be improved, for example, by increasing the error of the estimation result of the neural network. It is expected that it can be made smaller.
  • the replacement unit 194 can determine the value to be written in the byte to be replaced by comparing 256 pieces of data. In this respect, the replacement unit 194 can determine the value of the byte to be replaced and rewrite it in a relatively short time even when the number of bytes to be replaced is large.
  • FIG. 6 is a flowchart showing a procedure of the process of obfuscating the binary code by the obfuscation device 100.
  • the acquisition unit 110 acquires the obfuscated binary code and the assembly code corresponding to the binary code (step S11).
  • the obfuscation device 100 starts a loop L101 in which the division unit 191 performs processing for each subsequence cut out from the obfuscated binary code (step S12). Then, the division unit 191 cuts out the subsequence to be processed from the obfuscated binary code (step S13). When the dividing unit 191 has already cut out the subsequence from the obfuscated binary code, the subsequence to be processed is cut out from the remaining part of the obfuscated binary code from which the subsequence has been cut out.
  • the replacement target detection unit 192 detects the portion to be rewritten in the subsequence cut out by the division unit 191 (sequence S14). For example, the replacement target detection unit 192 detects the above-mentioned padding and mobile target commands. Then, the replacement unit 194 patches the junk bite to the portion to be rewritten detected by the replacement target detection unit 192 (step S15). For example, the replacement unit 194 determines and writes a value to be written in the portion to be rewritten by the method of searching for all solutions described above or the method using the gradient of the loss function.
  • the obfuscation device 100 performs termination processing of the loop L101 (step S16). Specifically, the obfuscation device 100 determines whether or not all the binary codes to be obfuscated are cut out as subsequences and processed. If it is determined that there is a portion that has not been cut out yet, the obfuscation device 100 continues to process the loop L101 for the portion that has not been cut out. On the other hand, when it is determined that all the obfuscated binary codes have been cut out as subsequences and processed, the obfuscation device 100 ends the loop L101.
  • the output unit 120 outputs the obfuscated binary code (step S17).
  • the obfuscated binary code is obtained by combining the processed subsequences in loop L101 in the same order as the original binary code.
  • the obfuscation device 100 ends the process of FIG.
  • the replacement unit 194 uses a loss function indicating the detectability of a predetermined pattern in the binary code to set the value of the part to be replaced in the binary code to be less than or equal to the predetermined condition. Replace with a smaller value.
  • the obfuscation device 100 can obfuscate the binary code without the need to rewrite the address.
  • the replacement unit 194 can obfuscate the binary code by rewriting the area defined as the replacement target. Therefore, the replacement unit 194 does not need to insert data into the binary code, nor does it need to replace one instruction or series of instructions in the binary code with one instruction or series of instructions having a longer byte length. ..
  • the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur.
  • the load of obfuscation of the binary code is relatively light in this respect.
  • the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction.
  • the obfuscation device 100 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
  • the loss function acquisition unit 193 inputs the estimation result when the binary code is converted into a one-hot vector and input to the neural network for estimating the position of the predetermined pattern in the binary code, and the position of the predetermined pattern in the binary code. Get the loss function that shows the correlation with the correct answer label of.
  • the replacement unit 194 inputs a binary data value of the one-hot vector indicating the slope at which the loss function is the largest among the slopes obtained by partially differentiating the loss function for each bit of the one-hot vector into the replacement target portion.
  • the replacement unit 194 rewrites the byte to be replaced based on the gradient (partial differential) of the loss function to obfuscate it, thereby reducing the estimation accuracy by the neural network, for example, increasing the error in the estimation result of the neural network. It is expected that it can be done.
  • the replacement unit 194 can determine the value to be written in the byte to be replaced by comparing 256 pieces of data. In this respect, the replacement unit 194 can determine the value of the byte to be replaced and rewrite it in a relatively short time even when the number of bytes to be replaced is large.
  • the replacement unit 194 transfers a series of instructions included in the binary code and one or more instructions in which the byte length of the entire series of instructions is longer than the byte length of the jump instruction to the free area.
  • the value of the original part is replaced with a jump instruction to the transfer destination and a value for reducing the detectability below a predetermined condition.
  • the portion to be replaced can be provided without having to rewrite the instruction to be executed.
  • the obfuscation device 100 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
  • the division unit 191 divides the binary code into subsequences having a predetermined length. Then, the replacement unit 194 performs a process of replacing the value of the portion of the binary code to be replaced with a value whose detectability becomes smaller than a predetermined condition for each subsequence.
  • the obfuscation device 100 can obfuscate an arbitrary length binary code by performing obfuscation processing for each subsequence.
  • FIG. 7 is a diagram showing an example of the configuration of the obfuscation device according to the embodiment.
  • the obfuscation device 200 shown in FIG. 7 includes a replacement unit 201.
  • the replacement unit 201 uses a loss function indicating the detectability of a predetermined pattern in the binary code to reduce the value of the part to be replaced in the binary code to be less than or equal to the predetermined condition. Replace with a value that becomes.
  • the obfuscation device 200 can obfuscate the binary code without the need to rewrite the address.
  • the replacement unit 201 can obfuscate the binary code by rewriting the area defined as the replacement target. Therefore, the replacement unit 201 does not need to insert data into the binary code, nor does it need to replace one instruction or series of instructions in the binary code with one instruction or series of instructions having a longer byte length. ..
  • the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur.
  • the load of obfuscation of the binary code is relatively light in this respect.
  • the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction.
  • the obfuscation device 200 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
  • FIG. 8 is a diagram showing an example of a processing procedure in the obfuscation method according to the embodiment.
  • the obfuscation processing method shown in FIG. 8 uses a loss function indicating the detectability of a predetermined pattern in the binary code, and sets the value of the part to be replaced in the binary code under the condition that the detectability is a predetermined condition. The step of replacing with a smaller value is included below.
  • the binary code can be obfuscated without the need to rewrite the address.
  • the binary code can be obfuscated by rewriting the area defined as the replacement target. Therefore, in the obfuscation method of FIG. 8, it is not necessary to insert data into the binary code, and one instruction or a series of instructions in the binary code is replaced with one instruction or a series of instructions having a longer byte length. There is no need. As described above, in the obfuscation method of FIG.
  • the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur.
  • the load of obfuscation of the binary code is relatively light in this respect.
  • the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction.
  • the obfuscation method of FIG. 8 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
  • FIG. 9 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
  • the computer 700 includes a CPU 710, a main storage device 720, an auxiliary storage device 730, and an interface 740. Any one or more of the obfuscation device 100 and the obfuscation device 200 may be mounted on the computer 700. In that case, the operation of each of the above-mentioned processing units is stored in the auxiliary storage device 730 in the form of a program.
  • the CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program.
  • the CPU 710 secures a storage area corresponding to each of the above-mentioned storage units in the main storage device 720 according to the program. Communication between each device and other devices is executed by having the interface 740 have a communication function and performing communication according to the control of the CPU 710.
  • the operations of the control unit 190 and each unit thereof are stored in the auxiliary storage device 730 in the form of a program.
  • the CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program. Further, the CPU 710 secures a storage area corresponding to the storage unit 180 in the main storage device 720 according to the program.
  • the function of the acquisition unit 110 and the function of the output unit 120 are executed by the interface 740 having a data input / output function such as a communication function and performing communication according to the control of the CPU 710.
  • the operation of the replacement unit 201 is stored in the auxiliary storage device 730 in the form of a program.
  • the CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program.
  • a program for realizing all or a part of the functions of the obfuscation device 100 and the obfuscation device 200 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system. , Each part may be processed by executing.
  • the term "computer system” as used herein includes hardware such as an OS (operating system) and peripheral devices.
  • "Computer readable recording medium” includes flexible disks, optomagnetic disks, portable media such as ROM (Read Only Memory) and CD-ROM (Compact Disc Read Only Memory), and hard disks built into computer systems.
  • the above-mentioned program may be a program for realizing a part of the above-mentioned functions, and may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.
  • the embodiment of the present invention may be applied to an obfuscation device, an obfuscation method, and a recording medium.

Abstract

This obfuscation device comprises a replacement unit which uses a loss function indicating the detectability of a predetermined pattern in binary codes and replaces a value of a part to be replaced in the binary codes with a value at which the detectability is a predetermined condition or less.

Description

難読化装置、難読化方法および記録媒体Obfuscation device, obfuscation method and recording medium
 本発明は、難読化装置、難読化方法および記録媒体に関する。 The present invention relates to an obfuscation device, an obfuscation method, and a recording medium.
 ソフトウェアの難読化に関連して、特許文献1には、パラメータに応じて復号に成功する確率が可変となる暗号化方式を用いて、パラメータに応じて真偽が異なる条件分岐を作成することが記載されている。第1パラメータによる第1コードからは、常に真とみなせる条件分岐が作成され、この場合は、条件分岐のNOの側にダミーの処理が割り当てられる。第2パラメータによる第2コードからは、常に偽とみなせる条件分岐が作成され、この場合は、条件分岐のYESの側にダミーの処理が割り当てられる。第3パラメータによる第3コードからは、真偽が不定の条件分岐が作成され、この場合は、条件分岐の双方に、記述が異なる等価の処理が割り当てられる。 In relation to software obfuscation, Patent Document 1 describes that a conditional branch having different truths depending on a parameter is created by using an encryption method in which the probability of successful decryption is variable according to the parameter. Are listed. From the first code according to the first parameter, a conditional branch that can always be regarded as true is created, and in this case, a dummy process is assigned to the NO side of the conditional branch. From the second code according to the second parameter, a conditional branch that can always be regarded as false is created, and in this case, a dummy process is assigned to the YES side of the conditional branch. From the third code based on the third parameter, a conditional branch whose truth is indefinite is created, and in this case, equivalent processing with different descriptions is assigned to both of the conditional branches.
日本国特開2018-106260号公報Japanese Patent Application Laid-Open No. 2018-106260
 バイナリコードを難読化する場合、ダミーのコード等の挿入によって以降のコードのアドレスが繰り下がると、データ参照先のアドレスやジャンプ命令の飛び先アドレスなど、アドレスの書き換えが生じて難読化処理の負荷が大きくなる。アドレスの書き換えの必要なしに、バイナリコードを難読化できることが好ましい。 When obfuscating a binary code, if the address of the subsequent code is moved down by inserting a dummy code etc., the address such as the address of the data reference destination or the jump destination address of the jump instruction will be rewritten and the load of the obfuscation process will be increased. Becomes larger. It is preferable that the binary code can be obfuscated without the need to rewrite the address.
 本発明の目的の一例は、上記の問題を解決することができる難読化装置、難読化方法および記録媒体を提供することである。 An example of an object of the present invention is to provide an obfuscation device, an obfuscation method, and a recording medium capable of solving the above problems.
 本発明の第1の態様によれば難読化装置は、バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える置換部を備える。 According to the first aspect of the present invention, the obfuscation device can detect the value of the part to be replaced in the binary code by using a loss function indicating the detectability of a predetermined pattern in the binary code. It is provided with a replacement part that replaces the property with a value that becomes smaller than a predetermined condition.
 本発明の第2の態様によれば、難読化方法は、バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える工程を含む。 According to the second aspect of the present invention, the obfuscation method uses a loss function indicating the detectability of a predetermined pattern in the binary code to detect the value of the part to be replaced in the binary code. It includes a step of replacing the possibility with a value that becomes smaller than a predetermined condition.
 本発明の第3の態様によれば記録媒体は、コンピュータに、バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える工程を実行させるためのプログラムを記録した記録媒体である。 According to the third aspect of the present invention, the recording medium uses a loss function indicating the detectability of a predetermined pattern in the binary code on a computer to obtain the value of the portion to be replaced in the binary code. It is a recording medium on which a program for executing a step of replacing the detectability with a value smaller than a predetermined condition is recorded.
 上記した難読化装置、難読化方法および記録媒体によれば、アドレスの書き換えの必要なしに、バイナリコードを難読化できる。 According to the above-mentioned obfuscation device, obfuscation method, and recording medium, the binary code can be obfuscated without the need to rewrite the address.
実施形態に係る難読化装置の機能構成の例を示す概略ブロック図である。It is a schematic block diagram which shows the example of the functional structure of the obfuscation device which concerns on embodiment. 実施形態に係る難読化装置がパターン検出精度を低下させる対象のニューラルネットワークの入出力の例を示す図である。It is a figure which shows the example of the input / output of the target neural network which the obfuscation apparatus which concerns on embodiment reduces the pattern detection accuracy. 実施形態に係る難読化装置が値の書き換えの対象とするパディングの例を示す図である。It is a figure which shows the example of padding which the obfuscation apparatus which concerns on embodiment is the object of rewriting of a value. 実施形態に係る難読化装置による移動対象の命令の例を示す図である。It is a figure which shows the example of the instruction of the movement target by the obfuscation device which concerns on embodiment. 実施形態に係る難読化装置による命令の移動の例を示す図である。It is a figure which shows the example of the movement of the instruction by the obfuscation device which concerns on embodiment. 実施形態に係る難読化装置がバイナリコードを難読化する処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process which obfuscate the binary code by the obfuscation apparatus which concerns on embodiment. 実施形態に係る難読化装置の構成の例を示す図である。It is a figure which shows the example of the structure of the obfuscation apparatus which concerns on embodiment. 実施形態に係る難読化方法における処理の手順の例を示す図である。It is a figure which shows the example of the processing procedure in the obfuscation method which concerns on embodiment. 少なくとも1つの実施形態に係るコンピュータの構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the computer which concerns on at least one Embodiment.
 以下、本発明の実施形態を説明するが、以下の実施形態は請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。
 図1は、実施形態に係る難読化装置の機能構成の例を示す概略ブロック図である。図1に示す構成で、難読化装置100は、取得部110と、出力部120と、記憶部180と、制御部190とを備える。制御部190は、分割部191と、置換対象検出部192と、損失関数取得部193と、置換部194とを備える。
Hereinafter, embodiments of the present invention will be described, but the following embodiments do not limit the inventions claimed. Also, not all combinations of features described in the embodiments are essential to the means of solving the invention.
FIG. 1 is a schematic block diagram showing an example of the functional configuration of the obfuscation device according to the embodiment. With the configuration shown in FIG. 1, the obfuscation device 100 includes an acquisition unit 110, an output unit 120, a storage unit 180, and a control unit 190. The control unit 190 includes a division unit 191, a replacement target detection unit 192, a loss function acquisition unit 193, and a replacement unit 194.
 難読化装置100は、バイナリコード(binary Code)を難読化する。特に、難読化装置100は、ニューラルネットワーク(Neural Network;NN)による関数のスタートアドレスの推定など、ニューラルネットワークを用いてバイナリコード中の特定のパターンを推定する精度を低下させる処理を行う。 The obfuscation device 100 obfuscates the binary code. In particular, the obfuscation device 100 performs a process of reducing the accuracy of estimating a specific pattern in the binary code by using the neural network, such as estimating the start address of the function by the neural network (NN).
 ここでいうバイナリコードは、実行形式のプログラムである。プログラムが2進数(ビット列)で表されていると解されることからバイナリコードと称される。例えばソースコード(高級言語で記載されたプログラム)をバイナリコードにコンパイル(Compile)することによって、バイナリコードを得られる。あるいは、アセンブルコード(Assemble Code、アセンブリ言語(Assembly Language)で記載されたプログラム)をアセンブル(Assemble)することによって、バイナリコードを得られる。 The binary code here is an executable program. It is called binary code because it is understood that the program is represented by a binary number (bit string). Binary code can be obtained, for example, by compiling the source code (a program written in a high-level language) into binary code. Alternatively, the binary code can be obtained by assembling the assemble code (Assemble Code, a program described in assembly language).
 ここで、リバースエンジニアリングによってバイナリコードを解析する際、バイナリコードに含まれる関数のスタートアドレスおよびエンドアドレスを検出する等により、関数を認識することが有用である。バイナリコードに含まれる関数のスタートアドレス等を検出する方法の1つに、深層学習などのニューラルネットワークを用いる方法がある。特に、リカレントニューラルネットワーク(Recurrent Neural Network;RNN)を用いる方法で、比較的高いパフォーマンスが示されている。 Here, when analyzing the binary code by reverse engineering, it is useful to recognize the function by detecting the start address and end address of the function included in the binary code. One of the methods for detecting the start address of a function included in the binary code is a method using a neural network such as deep learning. In particular, a method using a recurrent neural network (RNN) has shown relatively high performance.
 バイナリコードの不正な解析を防止したいバイナリコード提供者からすると、ニューラルネットワークを用いての、関数のスタートアドレス等の検出精度を低下させられることが好ましい。そこで、難読化装置100は、ニューラルネットワークがバイナリコード中の特定のパターン(例えば、関数のスタートアドレス)を検出する精度を低下させる処理を行う。これにより、バイナリコードの解析を比較的困難にすることができる。 From the perspective of a binary code provider who wants to prevent unauthorized analysis of binary code, it is preferable that the detection accuracy of the start address of a function or the like using a neural network can be reduced. Therefore, the obfuscation device 100 performs a process of reducing the accuracy with which the neural network detects a specific pattern (for example, the start address of the function) in the binary code. This can make the analysis of binary code relatively difficult.
 図2は、難読化装置100がパターン検出精度を低下させる対象のニューラルネットワークの入出力の例を示す図である。
 図2に示すニューラルネットワーク900は、バイナリコードの入力をバイト単位で受け付ける。図2の「byte 0」、「byte 1」、「byte 2」、・・・は、バイナリコードの先頭から順に各バイトの値を示す。
 ニューラルネットワーク900は、例えば、バイナリコードをバイト毎にワンホットベクトル(One-hot Vector)化したデータの入力を受け付ける。1バイトのデータのワンホットベクトル化は式(1)のように示される。
FIG. 2 is a diagram showing an example of input / output of a target neural network in which the obfuscation device 100 reduces the pattern detection accuracy.
The neural network 900 shown in FIG. 2 accepts binary code input in byte units. “Byte 0”, “byte 1”, “byte 2”, ... In FIG. 2 indicate the value of each byte in order from the beginning of the binary code.
The neural network 900 accepts, for example, input of data in which a binary code is converted into a one-hot vector for each byte. The one-hot vectorization of 1-byte data is shown by Eq. (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(1)の矢印の左側の「x」は、バイナリコードのうち1バイト分のデータを示す。「{0,1}」は、0または1の値をとる1ビットを示す。「{0,1}」は、8ビットのデータを示す。1バイトのデータを10進数で表記すると、0から255の何れかの整数値をとる。 The "x" on the left side of the arrow in equation (1) indicates one byte of data in the binary code. “{0,1}” indicates one bit that takes a value of 0 or 1. “{0,1} 8 ” indicates 8-bit data. When 1-byte data is expressed in decimal, it takes an integer value of 0 to 255.
 式(1)の矢印の右側の、上に矢印を付された「x」は、ワンホットベクトルを示す。「x」の上の矢印は、ベクトルであることを明示するために付されている。矢印を付された「x」をベクトルx(例えば、ワンホットベクトルx)、または、単にxとも表記する。 The "x" with an arrow on the right side of the arrow in equation (1) indicates a one-hot vector. The arrow above the "x" is attached to clearly indicate that it is a vector. The “x” with an arrow is also expressed as a vector x (for example, a one-hot vector x) or simply x.
 式(1)に示されるように、ワンホットベクトルxは、bからb255までの256個のビットの縦ベクトルで示される。ワンホットベクトルxでは、bからb255までの256個のビットのうち何れか1個のビットの値が「1」となり、それ以外の255個のビットの値は「0」となっている。 As shown in equation (1), the one-hot vector x is represented by a vertical vector of 256 bits from b 0 to b 255. In the one-hot vector x, the value of any one of the 256 bits from b 0 to b 255 is "1", and the value of the other 255 bits is "0". ..
 1バイトのデータのワンホットベクトル化では、バイトの値がワンホット表現で示される。すなわち、1バイトのデータの値がi(iは、0≦i≦255の整数)である場合、ビットbの値が「1」となり、それ以外のビットの値は「0」となる。 In one-hot vectorization of one-byte data, the byte value is shown in one-hot representation. That is, 1-byte value i of data (i is an integer of 0 ≦ i ≦ 255) if it is, the bit b i value is "1", the other bit value is "0".
 バイナリコードの入力を受けたニューラルネットワーク900は、バイナリコードのバイト毎に、そのバイトが関数の先頭であるか否かの2値分類を行い、分類結果を出力する。たとえば、ニューラルネットワーク900は、関数の先頭であると推定したバイトについては値「1」を出力し、関数の先頭ではないと推定したバイトについては値「0」を出力する。図2の「R0」、「R1」、「R2」、・・・は、それぞれ、関数の先頭か否かの推定結果に応じて「1」または「0」の何れかの値をとる。 The neural network 900 that receives the input of the binary code performs binary classification for each byte of the binary code whether or not the byte is the head of the function, and outputs the classification result. For example, the neural network 900 outputs a value "1" for a byte estimated to be the beginning of a function, and outputs a value "0" for a byte estimated not to be the beginning of a function. Each of "R0", "R1", "R2", ... In FIG. 2 takes a value of "1" or "0" depending on the estimation result of whether or not it is the beginning of the function.
 関数の先頭の位置を関数のスタートアドレスとも称する。関数の末尾の位置を関数のエンドアドレスとも称する。
 なお、ニューラルネットワーク900による推定対象は、関数の先頭の位置に限定されず、難読化対象のバイナリデータ中で検出可能ないろいろなパターンの位置とすることができる。
The position at the beginning of the function is also called the start address of the function. The position at the end of the function is also called the end address of the function.
The estimation target by the neural network 900 is not limited to the position at the beginning of the function, and can be the position of various patterns that can be detected in the obfuscated binary data.
 難読化装置100は、ニューラルネットワーク900の出力の精度がより低くなるように、バイナリコードを部分的に書き換える。その際、バイナリコードを実行するコンピュータが不測の動作をしないように、実行される命令自体は書き換えないようにする。バイナリコードを部分的に書き換えることを、その部分の値を置き換えるとも称する。 The obfuscation device 100 partially rewrites the binary code so that the accuracy of the output of the neural network 900 becomes lower. At that time, the instruction itself to be executed should not be rewritten so that the computer executing the binary code does not behave unexpectedly. Partial rewriting of binary code is also referred to as replacing the value of that part.
 ここで、バイナリコードを書き換える方法として、バイナリコードに含まれる一連の命令を、同等の処理を行う一連の命令に書き換える方法が考えられる。しかしながら、この方法では、書き換えられた一連の命令にバグが含まれるなどにより、バイナリコードを実行するコンピュータが不測の動作をする可能性がある。また、書き換え前の一連の命令のバイト長よりも、書き換えられた一連の命令のバイト長の方が長い場合、書き換え対象部分より後の部分のアドレスが繰り下がり、ジャンプ命令の飛び先アドレスなど、アドレスを書き換える必要が生じて難読化処理の負荷が大きくなる。
 また、バイナリコードを書き換える方法として、関数と関数との間に実行されないデータを挿入する方法も考えられる。しかしながら、この方法では、データの挿入部分より後の部分のアドレスが繰り下がる場合、ジャンプ命令の飛び先アドレスなど、アドレスを書き換える必要が生じて難読化処理の負荷が大きくなる。
Here, as a method of rewriting the binary code, a method of rewriting a series of instructions included in the binary code into a series of instructions that perform equivalent processing can be considered. However, with this method, the computer executing the binary code may behave unexpectedly due to a bug in the rewritten series of instructions. Also, if the byte length of the rewritten series of instructions is longer than the byte length of the series of instructions before rewriting, the address of the part after the rewrite target part is carried down, the jump destination address of the jump instruction, etc. It becomes necessary to rewrite the address, which increases the load of obfuscation processing.
Also, as a method of rewriting the binary code, a method of inserting unexecuted data between the functions can be considered. However, in this method, when the address of the portion after the data insertion portion is carried down, it becomes necessary to rewrite the address such as the jump destination address of the jump instruction, which increases the load of the obfuscation process.
 これに対し、難読化装置100は、難読化対象のバイナリコードのうち、参照されることのない部分の値を書き換えることで難読化を行う。ここでの参照には、実行のための参照が含まれる。したがって、ここでいう参照されることのない部分は、実行されること、参照されることの何れもない部分である。
 難読化装置100が、値を書き換える部分の1つとして、関数と関数との間等にあるパディング(Padding)が挙げられる。ここでいうパディングは、例えば関数のスタートアドレスを8バイト毎のブロックの先頭アドレスにするといったアライメント(Alignment)のために設けられる、参照されることのない部分である。
On the other hand, the obfuscation device 100 obfuscates the binary code to be obfuscated by rewriting the value of the portion that is not referenced. References here include references for execution. Therefore, the part that is not referred to here is a part that is neither executed nor referred to.
One of the parts where the obfuscation device 100 rewrites the value is padding between the functions. The padding here is a non-referenced part provided for alignment, for example, the start address of a function is set to the start address of a block every 8 bytes.
 図3は、難読化装置100が値の書き換えの対象とするパディングの例を示す図である。図3の例で、行番号1から始まる関数func1は、行番号6で終了し、行番号6のジャンプ(Jump)命令で処理が他に移る。このため、関数func1の末尾(行番号6)と関数func2の先頭(行番号9)との間に位置する行番号7(行L11)のnop、および、行番号8(行L12)のnopの何れも、実行されることはない。これら行L11およびL12がパディングの例に該当する。 FIG. 3 is a diagram showing an example of padding that the obfuscation device 100 targets for rewriting the value. In the example of FIG. 3, the function func1 starting from line number 1 ends at line number 6, and processing is transferred to another by the jump instruction of line number 6. Therefore, the nop of line number 7 (line L11) located between the end of function func1 (line number 6) and the beginning of function func2 (line number 9) and the nop of line number 8 (line L12) Neither is executed. These lines L11 and L12 correspond to the padding example.
 難読化装置100は、例えば行L11の値および行L12の値を書き換える。難読化装置100が、データを新たに挿入するのではなく、既にあるパディングの値を書き換えることで、それ以降のコードのアドレスの変更は生じない。この点で、難読化装置100では、アドレスを書き換える必要が生じず、難読化処理の負荷が比較的小さくて済む。 The obfuscation device 100 rewrites, for example, the value of line L11 and the value of line L12. The obfuscation device 100 rewrites the existing padding value instead of inserting new data, so that the address of the code is not changed after that. In this respect, the obfuscation device 100 does not need to rewrite the address, and the load of the obfuscation process can be relatively small.
 また、難読化装置100は、命令を空き領域に移動することで、値の書き換え対象の部分を生成する。
 図4は、難読化装置100による移動対象の命令の例を示す図である。
 図4の例で行L21のmov命令は、バイナリコードで5バイトの長さを有する。ジャンプ命令がバイナリコードで4バイトであり、mov命令の方がジャンプ命令よりもバイト長が長い。また、行L21は、他の部分から参照されることもなく、他の部分から行L21へ処理がジャンプすることもない。さらに、行L21のmov命令は、引数にアドレスを含んでいないので、このmov命令を書き換えずにそのまま移動させることができる。
 そこで、難読化装置100は、行L21のバイナリコードを移動対象に選択する。
Further, the obfuscation device 100 generates a portion to be rewritten of the value by moving the instruction to the free area.
FIG. 4 is a diagram showing an example of a command to be moved by the obfuscation device 100.
In the example of FIG. 4, the move instruction on line L21 has a length of 5 bytes in binary code. The jump instruction is 4 bytes in binary code, and the move instruction has a longer byte length than the jump instruction. Further, the line L21 is not referred to by another part, and the process does not jump from the other part to the line L21. Further, since the move instruction on line L21 does not include an address as an argument, the move instruction can be moved as it is without being rewritten.
Therefore, the obfuscation device 100 selects the binary code of line L21 as the movement target.
 このように、難読化装置100は、ジャンプ命令よりもバイト長が長く、かつ、他の部分からの参照の対象にも、他の部分からのジャンプ先にもなっておらず、かつ、書き換えなしに移動可能な命令を、移動対象に選択する。
 なお、図4では、難読化装置100が、1つの命令を移動対象に選択する場合の例を示しているが、難読化装置100が、一連の複数の命令を移動対象に選択するようにしてもよい。この場合、一連の複数の命令のバイト長の合計がジャンプ命令のバイト長よりも長く、かつ、複数の命令の何れも、他の部分からの参照の対象にも、他の部分からのジャンプ先にもなっておらず、書き換えなしに移動可能であればよい。
As described above, the obfuscation device 100 has a longer byte length than the jump instruction, is neither a reference target from another part nor a jump destination from another part, and is not rewritten. Select the command that can be moved to as the movement target.
Although FIG. 4 shows an example in which the obfuscation device 100 selects one instruction as the movement target, the obfuscation device 100 selects a series of a plurality of instructions as the movement target. May be good. In this case, the total byte length of a series of plurality of instructions is longer than the byte length of the jump instruction, and any of the plurality of instructions can be referred to by another part and the jump destination from the other part. It suffices if it can be moved without rewriting.
 図5は、難読化装置100による命令の移動の例を示す図である。
 図5は、難読化装置100が、図4の行L21を空き領域である行L35へ移動する例を示している。ここでいう空き領域は、バイナリコードに使用可能な記憶部180のメモリ領域のうち、使用されていない領域である。書き換え前の空き領域は、パディングと同様、参照されることのない領域である。
FIG. 5 is a diagram showing an example of instruction movement by the obfuscation device 100.
FIG. 5 shows an example in which the obfuscation device 100 moves the row L21 of FIG. 4 to the free area row L35. The free area referred to here is an unused area of the memory area of the storage unit 180 that can be used for the binary code. The free area before rewriting is an area that is not referenced like padding.
 難読化装置100は、図4の行L21のmov命令を、書き換えずにそのまま行L35へ移動させている。また、難読化装置100は、mov命令の移動元である行L32およびL33のうち先頭側の行L32に、mov命令の移動先である行L35へのジャンプ命令を設けている。さらに難読化装置100は、mov命令の移動先である行L35の直後の行L36に、mov命令の移動元の直後の行L34へのジャンプ命令を設けている。 The obfuscation device 100 moves the move instruction in line L21 of FIG. 4 to line L35 as it is without rewriting. Further, the obfuscation device 100 provides a jump instruction to the line L35, which is the destination of the move instruction, in the first line L32 of the lines L32 and L33, which is the source of the move instruction. Further, the obfuscation device 100 provides a jump instruction to the line L34 immediately after the movement source of the move instruction in the line L36 immediately after the line L35 to which the move instruction is moved.
 これにより、図5のコードを実行するコンピュータは、add命令(行L31)、ジャンプ命令(行L32)、mov命令(行L35)、ジャンプ命令(行L36)、add命令(行L34)の順で実行する。行L32のジャンプ命令、行L36のジャンプ命令のそれぞれで処理がジャンプする点を除けば、コンピュータは、図5のコードにより、図4のコードの場合と同じ命令を同じ順番に実行する。 As a result, the computer that executes the code of FIG. 5 has an add instruction (line L31), a jump instruction (line L32), a move instruction (line L35), a jump instruction (line L36), and an add instruction (line L34) in this order. Execute. The computer executes the same instructions in the same order as in the code of FIG. 4 according to the code of FIG. 5, except that the processing jumps at each of the jump instruction of line L32 and the jump instruction of line L36.
 また、図5の例では、行L33が参照されることのない領域となっている。難読化装置100は、バイナリコードの難読化のために行L33の値を書き換える。
 このように、難読化装置100は、上述した条件を満たすコマンドを空き領域へ移動させ、ジャンプ命令で命令の実行順序を維持し、かつ、バイナリコードの難読化のために値を書き換え可能な領域を得られる。
Further, in the example of FIG. 5, the row L33 is a region that is not referred to. The obfuscation device 100 rewrites the value of line L33 for obfuscation of the binary code.
In this way, the obfuscation device 100 moves the command satisfying the above-mentioned conditions to the free area, maintains the execution order of the commands by the jump instruction, and rewrites the value for obfuscation of the binary code. Can be obtained.
 取得部110は、難読化対象のバイナリコードを取得する。
 また、取得部110は、難読化対象のバイナリコードに関する情報を取得する。具体的には、取得部110は、難読化対象のバイナリコードにおける命令と、難読化対象のバイナリコードの実行の際に参照されることのない領域とを把握可能な情報を取得する。以下では、取得部110が、難読化対象のバイナリコードに対応するアセンブルコードを取得する場合を例に説明する。
The acquisition unit 110 acquires the obfuscated binary code.
In addition, the acquisition unit 110 acquires information regarding the binary code to be obfuscated. Specifically, the acquisition unit 110 acquires information capable of grasping an instruction in the obfuscated binary code and an area that is not referred to when the obfuscated binary code is executed. In the following, a case where the acquisition unit 110 acquires the assemble code corresponding to the obfuscated binary code will be described as an example.
 ここでいうバイナリコードに対応するアセンブルコードは、そのバイナリコードの情報を示すアセンブルコードである。バイナリコードに対応するアセンブルコードは、そのバイナリコードの生成元(アセンブルされるソース)のアセンブルコードであってもよい。あるいは、バイナリコードに対応するアセンブルコードは、そのバイナリコードを逆アセンブルして得られるアセンブルコードであってもよい。 The assemble code corresponding to the binary code referred to here is an assemble code indicating the information of the binary code. The assemble code corresponding to the binary code may be the assemble code of the source (source to be assembled) of the binary code. Alternatively, the assemble code corresponding to the binary code may be an assemble code obtained by disassembling the binary code.
 取得部110がバイナリコードおよびアセンブルコードを取得する方法は、特定の方法に限定されない。例えば、取得部110が通信機能を有し、他の装置からバイナリコードおよびアセンブリコードを受信するようにしてもよい。あるいは、取得部110が、ソースコードをアセンブリコードにコンパイルし、さらに、バイナリコードにアセンブルするようにしてもよい。 The method by which the acquisition unit 110 acquires the binary code and the assemble code is not limited to a specific method. For example, the acquisition unit 110 may have a communication function and receive the binary code and the assembly code from another device. Alternatively, the acquisition unit 110 may compile the source code into assembly code and further assemble it into binary code.
 出力部120は、難読化されたバイナリコードを出力する。
 出力部120が、バイナリコードを出力する方法は、特定の方法に限定されない。例えば、出力部120が通信機能を有し、他の装置にバイナリコードを送信するようにしてもよい。あるいは、出力部120が、外部メモリ(難読化装置100に着脱可能な記憶デバイス)にバイナリコードを書き込むようにしてもよい。
The output unit 120 outputs the obfuscated binary code.
The method by which the output unit 120 outputs the binary code is not limited to a specific method. For example, the output unit 120 may have a communication function and transmit a binary code to another device. Alternatively, the output unit 120 may write the binary code to an external memory (a storage device that can be attached to and detached from the obfuscation device 100).
 記憶部180は、各種データを記憶する。記憶部180は、難読化装置100が備える記憶デバイスを用いて構成される。
 制御部190は、難読化装置100の各部を制御して各種処理を実行する。制御部190の機能は、難読化装置100が備えるCPU(Central Processing Unit、中央処理装置)が、記憶部180からプログラムを読み出して実行することで実行される。
The storage unit 180 stores various data. The storage unit 180 is configured by using the storage device included in the obfuscation device 100.
The control unit 190 controls each unit of the obfuscation device 100 to execute various processes. The function of the control unit 190 is executed by the CPU (Central Processing Unit) included in the obfuscation device 100 reading a program from the storage unit 180 and executing the program.
 分割部191は、難読化対象のバイナリコードを所定長のサブシーケンスに分割する。難読化装置100は、サブシーケンス毎に難読化の処理を行うことで、任意長のバイナリコードを難読化することができる。分割部191が、難読化対象のバイナリコード分割するサブシーケンスの長さは、特定の長さに限定されない。
 バイナリコードの残りの長さがサブシーケンスの長さよりも短い場合、例えば分割部191が、値が「0」のビット列などのデータをバイナリコードの残りに付け加えて、サブシーケンスの長さにするようにしてもよい。
The division unit 191 divides the obfuscated binary code into subsequences of a predetermined length. The obfuscation device 100 can obfuscate an arbitrary length binary code by performing obfuscation processing for each subsequence. The length of the subsequence that the division unit 191 divides the obfuscated binary code into is not limited to a specific length.
If the remaining length of the binary code is shorter than the length of the subsequence, for example, the divider 191 adds data such as a bit string with a value of "0" to the rest of the binary code to make it the length of the subsequence. It may be.
 置換対象検出部192は、難読化対象のバイナリコードのうち、置換対象の部分を検出する。
 例えば、置換対象検出部192は、図3の行L11およびL12に例示されるパディングを、置換対象の部分として検出する。
 また、置換対象検出部192は、図4の行L21に例示される、移動対象の命令を検出する。置換対象検出部192が、1個の命令を検出するようにしてもよいし、一連の命令を検出するようにしてもよい。
The replacement target detection unit 192 detects the replacement target portion of the obfuscated binary code.
For example, the replacement target detection unit 192 detects the padding illustrated in rows L11 and L12 of FIG. 3 as the replacement target portion.
Further, the replacement target detection unit 192 detects the instruction to be moved, which is exemplified in the line L21 of FIG. The replacement target detection unit 192 may detect one instruction or a series of instructions.
 損失関数取得部193は、損失関数を取得する。ここでいう損失関数は、バイナリコードをワンホットベクトル化してニューラルネットに入力した場合の、バイナリコード中における所定のパターンの位置の推定結果と、バイナリコード中における所定のパターンの位置の正解ラベルとの相関性を示す関数である。
 例えば、損失関数として、ニューラルネットワークの推定結果から正解ラベルの値を減算した誤差を算出する関数を用いるようにしてもよい。
The loss function acquisition unit 193 acquires the loss function. The loss function referred to here is the estimation result of the position of a predetermined pattern in the binary code when the binary code is converted into a one-hot vector and input to the neural network, and the correct label of the position of the predetermined pattern in the binary code. It is a function showing the correlation of.
For example, as the loss function, a function for calculating an error obtained by subtracting the value of the correct label from the estimation result of the neural network may be used.
 ここで、分割部191がバイナリコードを分割したサブシーケンスXのバイト長をs+1バイトとし、サブシーケンスXの各バイトを先頭のバイトから順にワンホットベクトルx、・・・、xで表すと、サブシーケンスXは、式(2)のように示される。 Here, if the byte length of the subsequence X obtained by dividing the binary code by the dividing unit 191 is s + 1 bytes, and each byte of the subsequence X is represented by one hot vector x 0 , ..., X s in order from the first byte. , Subsequence X is expressed as in Eq. (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 また、ニューラルネットワークを関数fで表すと、サブシーケンスXに対するニューラルネットワークの予測結果は、f(X)と表される。
 f(X)が、図2のニューラルネットワーク900の出力R0、R1、・・・のように、要素が「0」または「1」の何れかの値をとるベクトルを出力するようにしてもよい。
When the neural network is represented by the function f, the prediction result of the neural network with respect to the subsequence X is represented by f (X).
f (X) may output a vector whose elements take a value of either "0" or "1", such as the outputs R0, R1, ... Of the neural network 900 of FIG. ..
 あるいは、f(X)が、ニューラルネットワーク900に入力されるバイナリコードのバイト毎に、そのバイトが関数の先頭である可能性の大きさを示す値を出力するようにしてもよい。この場合、図2の「R0」、「R1」、「R2」、・・・が、バイナリコードの先頭から順に各バイトが関数の先頭となっている可能性の大きさを示すことに相当する。f(X)が、可能性の大きさを示す値として確率を出力するようにしてもよいが、これに限定されない。 Alternatively, f (X) may output a value indicating the magnitude of the possibility that the byte is the beginning of the function for each byte of the binary code input to the neural network 900. In this case, "R0", "R1", "R2", ... In FIG. 2 correspond to the magnitude of the possibility that each byte is the beginning of the function in order from the beginning of the binary code. .. f (X) may output the probability as a value indicating the magnitude of the possibility, but the present invention is not limited to this.
 また、サブシーケンスXにおける所定のパターンの位置の正解ラベルをYと表記する。
 損失関数Lossは、f(X)およびYを用いてLoss(f(X),Y)と表記される。
 損失関数取得部193は、関数fについて既知であり、損失関数Loss(f(X),Y)を算出可能であるものとする。
Further, the correct label at the position of the predetermined pattern in the subsequence X is referred to as Y.
The loss function Loss is expressed as Loss (f (X), Y) using f (X) and Y.
It is assumed that the loss function acquisition unit 193 is known about the function f and can calculate the loss function Loss (f (X), Y).
 置換部194は、置換対象検出部192が検出する置換対象の部分を書き換える。
 置換対象検出部192が置換対象の部分としてパディングを検出した場合、置換部194は、パディングを書き換える。すなわち、置換部194はパディングのバイト値を更新する。
 置換部194が値を更新したバイトをジャンクバイト(Junk Byte)とも称する。
The replacement unit 194 rewrites the replacement target portion detected by the replacement target detection unit 192.
When the replacement target detection unit 192 detects padding as the replacement target portion, the replacement unit 194 rewrites the padding. That is, the replacement unit 194 updates the padding byte value.
The byte whose value is updated by the replacement unit 194 is also referred to as a junk byte.
 置換対象検出部192が、置換対象の部分として移動対象の命令を検出した場合、置換部194は、置換対象検出部192が検出した移動対象の命令に対し、図4および図5を参照して説明した命令の移動およびジャンプ命令の配置を行う。これにより、置換対象検出部192は、図4の行L33に例示される書き換え対象の領域を設ける。そして置換対象検出部192は、書き換え対象の領域を書き換える。 When the replacement target detection unit 192 detects a movement target command as a replacement target portion, the replacement unit 194 refers to FIGS. 4 and 5 with respect to the movement target command detected by the replacement target detection unit 192. Move the instructions and place the jump instructions as described. As a result, the replacement target detection unit 192 provides the area to be rewritten illustrated in line L33 of FIG. Then, the replacement target detection unit 192 rewrites the area to be rewritten.
 置換部194は、損失関数取得部193が取得する損失関数を用いて、難読化対象のバイナリコードのうち置換対象の部分の値を、ニューラルネットワークによる所定のパターンの検出可能性が所定の条件以下に小さくなる値に置き換える。
 例えば、置換部194が、置換対象の全てのバイトについて全解探索を行って、損失関数が示す検出可能性が最も小さくなる値に書き換えるようにしてもよい。具体的には、置換部194は、置換対象の全てのバイトについて、各バイトの256通りの値の全ての組み合わせそれぞれについて損失関数値を算出する。そして、置換部194は、損失関数値が最も小さくなる組み合わせを採用し、置換対象の各バイトの値を、採用した組み合わせにおける値に置き換える。
The replacement unit 194 uses the loss function acquired by the loss function acquisition unit 193 to set the value of the replacement target part of the obfuscated binary code to be less than or equal to the predetermined condition that the neural network can detect a predetermined pattern. Replace with a smaller value.
For example, the replacement unit 194 may perform a full solution search for all the bytes to be replaced and rewrite the value to the value that minimizes the detectability indicated by the loss function. Specifically, the replacement unit 194 calculates the loss function value for all combinations of 256 values of each byte for all the bytes to be replaced. Then, the replacement unit 194 adopts the combination having the smallest loss function value, and replaces the value of each byte to be replaced with the value in the adopted combination.
 全解探索による方法では、ニューラルネットワークによる推定精度が最も低くなる解を得られる。一方、全解探索による方法では、置換対象のバイトの個数のべき乗の計算量となるため、置換対象のバイトの個数が多い場合は、現実的な時間内に解を得らえない可能性がある。 With the full solution search method, the solution with the lowest estimation accuracy by the neural network can be obtained. On the other hand, in the method using the full solution search, the amount of calculation is the power of the number of bytes to be replaced. Therefore, if the number of bytes to be replaced is large, it may not be possible to obtain a solution within a realistic time. is there.
 そこで、置換部194が、損失関数をワンホットベクトルのビット毎に偏微分した傾きに基づいて書き換え対象の領域に書き込む値を決定するようにしてもよい。具体的には、損失関数が最も大きくなる傾きを示すワンホットベクトルをバイナリデータ化した値を置換対象の部分に入力する(書き込む)。 Therefore, the replacement unit 194 may determine the value to be written in the region to be rewritten based on the slope obtained by partially differentiating the loss function for each bit of the one-hot vector. Specifically, the value obtained by converting the one-hot vector showing the slope at which the loss function is the largest into binary data is input (written) to the part to be replaced.
 この場合の難読化後のサブシーケンスXのワンホットベクトルによる表現は式(3)のよう示される。 The one-hot vector representation of the obfuscated subsequence X * in this case is given by Eq. (3).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 矢印を付されたx (iは、0≦i≦sの整数)は、難読化後のサブシーケンスXのバイトのワンホット表現を示し、式(4)のように示される。 The arrowed x i * (i is an integer of 0 ≦ i ≦ s) indicates a one-hot representation of the bytes of the subsequence X * after obfuscation, and is expressed as in equation (4).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 上述したように、b(iは、0≦i≦255の整数)は、1バイトを示すワンホットベクトルのビットを示す。したがって、∂Loss/∂b、・・・、∂Loss/∂bは、ワンホットベクトルの各ビットによる、損失関数Lossの偏微分を示す。
 argmaxは、ここでは、ベクトルの要素のうち値が最大になる要素の値を「1」にし、他の要素の値を「0」にするワンホットベクトル化を示す。
 argmaxによるワンホットベクトル化の例を式(5)に示す。
As described above, b i (i is an integer of 0 ≦ i ≦ 255) indicates a bit one-hot vector indicating 1 byte. Therefore, ∂Loss / ∂b 0 , ···, ∂Loss / ∂b 0 indicates the partial differential of the loss function Loss by each bit of the one-hot vector.
Here, argmax indicates one-hot vectorization in which the value of the element having the maximum value among the elements of the vector is set to "1" and the value of the other element is set to "0".
An example of one-hot vectorization by argmax is shown in equation (5).
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 式(5)の矢印の左側には、要素の値が「1」、「0」、「5」の3要素のベクトルが示されている。これら3つの要素のうち「5」の値が最も大きい。
 argmaxによるホットベクトル化では、値が最も大きい要素「5」の値を「1」にし、それ以外の要素の値を「0」にする。式(5)の矢印の左側のベクトルをargmaxによってワンホットベクトル化すると、矢印の右側に示されるように、要素の値が「0」、「0」、「1」の3要素のベクトルを得られる。
On the left side of the arrow in the equation (5), a vector of three elements whose element values are "1", "0", and "5" is shown. Of these three elements, the value of "5" is the largest.
In the hot vectorization by argmax, the value of the element "5" having the largest value is set to "1", and the values of the other elements are set to "0". When the vector on the left side of the arrow in equation (5) is converted into a one-hot vector by argmax, the vector of three elements whose element values are "0", "0", and "1" is obtained as shown on the right side of the arrow. Be done.
 上記の式(4)では、ワンホットベクトルxが置換対象のバイトに該当する場合、すなわち、ワンホットベクトルxが置換対象のバイトのワンホット表記である場合は、損失関数Lossの値をワンホットベクトルの各要素で微分した値にargmaxを適用して得られるワンホットベクトルに置き換える。
 一方、ワンホットベクトルxが置換対象のバイトに該当しない場合、すなわち、ワンホットベクトルxが置換対象のバイトのワンホット表記ではない場合、難読化前のワンホットベクトルxをそのままワンホットベクトルx とする。
In the above equation (4), when the one-hot vector x i corresponds to the byte to be replaced, that is, when the one-hot vector x i is the one-hot notation of the byte to be replaced, the value of the loss function Loss is set. It is replaced with the one-hot vector obtained by applying argmax to the value differentiated by each element of the one-hot vector.
On the other hand, if the one-hot vector x i does not correspond to the bytes to be replaced, i.e., one-hot when the vector x i is not a one-hot representation of bytes to be replaced, as it is one-hot one-hot vector x i before obfuscation Let the vector x i * be.
 このように、置換部194が損失関数Lossの勾配に基づいて置換対象のバイトを書き換えて難読化を行うことで、例えば、ニューラルネットワークの推定結果の誤差を大きくするなど、ニューラルネットワークによる推定精度を小さくできることが期待される。
 かつ、置換部194は、256個のデータの比較によって置換対象のバイトに書き込む値を決定できる。この点で、置換部194は、置換対象のバイトの個数が多い場合でも、比較的短時間で置換対象のバイトの値を決定して書き換えを行うことができる。
In this way, the replacement unit 194 rewrites the byte to be replaced based on the gradient of the loss function Loss to obfuscate it, so that the estimation accuracy by the neural network can be improved, for example, by increasing the error of the estimation result of the neural network. It is expected that it can be made smaller.
In addition, the replacement unit 194 can determine the value to be written in the byte to be replaced by comparing 256 pieces of data. In this respect, the replacement unit 194 can determine the value of the byte to be replaced and rewrite it in a relatively short time even when the number of bytes to be replaced is large.
 次に、図6を参照して難読化装置100の動作について説明する。
 図6は、難読化装置100がバイナリコードを難読化する処理の手順を示すフローチャートである。
 図6の処理で、取得部110は、難読化対象のバイナリコードと、そのバイナリコードに対応するアセンブリコードを取得する(ステップS11)。
Next, the operation of the obfuscation device 100 will be described with reference to FIG.
FIG. 6 is a flowchart showing a procedure of the process of obfuscating the binary code by the obfuscation device 100.
In the process of FIG. 6, the acquisition unit 110 acquires the obfuscated binary code and the assembly code corresponding to the binary code (step S11).
 次に、難読化装置100は、分割部191が難読化対象のバイナリコードから切り出すサブシーケンス毎に処理を行うループL101を開始する(ステップS12)。
 そして、分割部191が、難読化対象のバイナリコードから処理対象のサブシーケンスを切り出す(ステップS13)。分割部191が、難読化対象のバイナリコードからのサブシーケンスの切り出しを既に行っている場合、難読化対象のバイナリコードのうちサブシーケンスを切り出した残りの部分から、処理対象のサブシーケンスを切り出す。
Next, the obfuscation device 100 starts a loop L101 in which the division unit 191 performs processing for each subsequence cut out from the obfuscated binary code (step S12).
Then, the division unit 191 cuts out the subsequence to be processed from the obfuscated binary code (step S13). When the dividing unit 191 has already cut out the subsequence from the obfuscated binary code, the subsequence to be processed is cut out from the remaining part of the obfuscated binary code from which the subsequence has been cut out.
 次に、置換対象検出部192は、分割部191が切り出したサブシーケンスのうち、書き換え対象の部分を検出する(シーケンスS14)。例えば、置換対象検出部192は、上述したパディングおよび移動体対象の命令を検出する。
 そして、置換部194は、置換対象検出部192が検出した書き換え対象の部分にジャンクバイトをパッチする(ステップS15)。例えば、置換部194は、上述した全解探索の方法、または、損失関数の勾配を用いる方法で、書き換え対象の部分に書き込む値を決定し、書き込む。
Next, the replacement target detection unit 192 detects the portion to be rewritten in the subsequence cut out by the division unit 191 (sequence S14). For example, the replacement target detection unit 192 detects the above-mentioned padding and mobile target commands.
Then, the replacement unit 194 patches the junk bite to the portion to be rewritten detected by the replacement target detection unit 192 (step S15). For example, the replacement unit 194 determines and writes a value to be written in the portion to be rewritten by the method of searching for all solutions described above or the method using the gradient of the loss function.
 次に、難読化装置100は、ループL101の終端処理を行う(ステップS16)。具体的には、難読化装置100は、難読化対象のバイナリコードが全てサブシーケンスとして切り出されて処理されたか否かを判定する。
 まだ切り出されていない部分があると判定した場合、難読化装置100は、切り出されていない部分について引き続きループL101の処理を行う。
 一方、難読化対象のバイナリコードが全てサブシーケンスとして切り出されて処理されたと判定した場合、難読化装置100は、ループL101を終了する。
Next, the obfuscation device 100 performs termination processing of the loop L101 (step S16). Specifically, the obfuscation device 100 determines whether or not all the binary codes to be obfuscated are cut out as subsequences and processed.
If it is determined that there is a portion that has not been cut out yet, the obfuscation device 100 continues to process the loop L101 for the portion that has not been cut out.
On the other hand, when it is determined that all the obfuscated binary codes have been cut out as subsequences and processed, the obfuscation device 100 ends the loop L101.
 ステップS16で難読化装置100がループL101を終了した場合、出力部120が、難読化されたバイナリコードを出力する(ステップS17)。難読化されたバイナリコードは、ループL101での処理後のサブシーケンスを、元のバイナリコードと同じ順番に結合して得られる。
 ステップS17の後、難読化装置100は図6の処理を終了する。
When the obfuscation device 100 ends the loop L101 in step S16, the output unit 120 outputs the obfuscated binary code (step S17). The obfuscated binary code is obtained by combining the processed subsequences in loop L101 in the same order as the original binary code.
After step S17, the obfuscation device 100 ends the process of FIG.
 以上のように、置換部194は、バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、バイナリコードのうち置換対象の部分の値を、検出可能性が所定の条件以下に小さくなる値に置き換える。
 これにより、難読化装置100では、アドレスの書き換えの必要なしに、バイナリコードを難読化できる。特に、置換部194は、置換対象に定められた領域を書き換えることでバイナリコードを難読化できる。したがって、置換部194は、バイナリコードにデータを挿入する必要は無く、また、バイナリコード中の1つの命令または一連の命令を、バイト長がより長い1つの命令または一連の命令に置き換える必要もない。このように、難読化装置100による難読化では、バイナリコードのうち値を置き換えた部分以降の部分のアドレスの繰り下がりは生じず、したがって、アドレスの書き換えの必要は生じない。難読化装置100によれば、この点で、バイナリコードの難読化の負荷が比較的軽くて済む。
As described above, the replacement unit 194 uses a loss function indicating the detectability of a predetermined pattern in the binary code to set the value of the part to be replaced in the binary code to be less than or equal to the predetermined condition. Replace with a smaller value.
As a result, the obfuscation device 100 can obfuscate the binary code without the need to rewrite the address. In particular, the replacement unit 194 can obfuscate the binary code by rewriting the area defined as the replacement target. Therefore, the replacement unit 194 does not need to insert data into the binary code, nor does it need to replace one instruction or series of instructions in the binary code with one instruction or series of instructions having a longer byte length. .. As described above, in the obfuscation by the obfuscation device 100, the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur. According to the obfuscation device 100, the load of obfuscation of the binary code is relatively light in this respect.
 また、難読化装置100によれば、実行される命令は書き換えず、パディングなど実行されない部分を書き換えることで、バイナリコードの難読化を行える。難読化装置100ではこの点で、難読化されたバイナリコードをコンピュータが実行する際に、コンピュータが不測の動作をする可能性を低減できる。 Further, according to the obfuscation device 100, the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction. In this respect, the obfuscation device 100 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
 また、損失関数取得部193は、バイナリコード中における所定のパターンの位置を推定するニューラルネットワークに、バイナリコードをワンホットベクトル化して入力した場合の推定結果と、バイナリコード中における所定のパターンの位置の正解ラベルとの相関性を示す損失関数を取得する。置換部194は、損失関数をワンホットベクトルのビット毎に偏微分した傾きのうち、損失関数が最も大きくなる傾きを示すワンホットベクトルをバイナリデータ化した値を置換対象の部分に入力する。 Further, the loss function acquisition unit 193 inputs the estimation result when the binary code is converted into a one-hot vector and input to the neural network for estimating the position of the predetermined pattern in the binary code, and the position of the predetermined pattern in the binary code. Get the loss function that shows the correlation with the correct answer label of. The replacement unit 194 inputs a binary data value of the one-hot vector indicating the slope at which the loss function is the largest among the slopes obtained by partially differentiating the loss function for each bit of the one-hot vector into the replacement target portion.
 置換部194が損失関数の勾配(偏微分)に基づいて置換対象のバイトを書き換えて難読化を行うことで、例えば、ニューラルネットワークの推定結果の誤差を大きくするなど、ニューラルネットワークによる推定精度を小さくできることが期待される。
 かつ、置換部194は、256個のデータの比較によって置換対象のバイトに書き込む値を決定できる。この点で、置換部194は、置換対象のバイトの個数が多い場合でも、比較的短時間で置換対象のバイトの値を決定して書き換えを行うことができる。
The replacement unit 194 rewrites the byte to be replaced based on the gradient (partial differential) of the loss function to obfuscate it, thereby reducing the estimation accuracy by the neural network, for example, increasing the error in the estimation result of the neural network. It is expected that it can be done.
In addition, the replacement unit 194 can determine the value to be written in the byte to be replaced by comparing 256 pieces of data. In this respect, the replacement unit 194 can determine the value of the byte to be replaced and rewrite it in a relatively short time even when the number of bytes to be replaced is large.
 また、置換部194は、バイナリコードに含まれる一連の命令かつ1つ以上の命令のうち、一連の命令全体のバイト長がジャンプ命令のバイト長よりも長い一連の命令を空き領域に移し、移転元の部分の値を、移転先へのジャンプ命令、および、検出可能性を所定の条件以下に小さくするための値に置き換える。
 これにより、難読化装置100では、実行される命令を書き換える必要なしに、置換対象の部分を設けることができる。難読化装置100ではこの点で、難読化されたバイナリコードをコンピュータが実行する際に、コンピュータが不測の動作をする可能性を低減できる。
Further, the replacement unit 194 transfers a series of instructions included in the binary code and one or more instructions in which the byte length of the entire series of instructions is longer than the byte length of the jump instruction to the free area. The value of the original part is replaced with a jump instruction to the transfer destination and a value for reducing the detectability below a predetermined condition.
As a result, in the obfuscation device 100, the portion to be replaced can be provided without having to rewrite the instruction to be executed. In this respect, the obfuscation device 100 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
 また、分割部191は、バイナリコードを所定長のサブシーケンスに分割する。そして、置換部194は、バイナリコードのうち置換対象の部分の値を、検出可能性が所定の条件以下に小さくなる値に置き換える処理を、サブシーケンス毎に行う。
 難読化装置100は、サブシーケンス毎に難読化の処理を行うことで、任意長のバイナリコードを難読化することができる。
Further, the division unit 191 divides the binary code into subsequences having a predetermined length. Then, the replacement unit 194 performs a process of replacing the value of the portion of the binary code to be replaced with a value whose detectability becomes smaller than a predetermined condition for each subsequence.
The obfuscation device 100 can obfuscate an arbitrary length binary code by performing obfuscation processing for each subsequence.
 図7は、実施形態に係る難読化装置の構成の例を示す図である。図7に示す難読化装置200は、置換部201を備える。
 かかる構成で、置換部201は、バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、バイナリコードのうち置換対象の部分の値を、検出可能性が所定の条件以下に小さくなる値に置き換える。
FIG. 7 is a diagram showing an example of the configuration of the obfuscation device according to the embodiment. The obfuscation device 200 shown in FIG. 7 includes a replacement unit 201.
In such a configuration, the replacement unit 201 uses a loss function indicating the detectability of a predetermined pattern in the binary code to reduce the value of the part to be replaced in the binary code to be less than or equal to the predetermined condition. Replace with a value that becomes.
 これにより、難読化装置200では、アドレスの書き換えの必要なしに、バイナリコードを難読化できる。特に、置換部201は、置換対象に定められた領域を書き換えることでバイナリコードを難読化できる。したがって、置換部201は、バイナリコードにデータを挿入する必要は無く、また、バイナリコード中の1つの命令または一連の命令を、バイト長がより長い1つの命令または一連の命令に置き換える必要もない。このように、難読化装置200による難読化では、バイナリコードのうち値を置き換えた部分以降の部分のアドレスの繰り下がりは生じず、したがって、アドレスの書き換えの必要は生じない。難読化装置200によれば、この点で、バイナリコードの難読化の負荷が比較的軽くて済む。 As a result, the obfuscation device 200 can obfuscate the binary code without the need to rewrite the address. In particular, the replacement unit 201 can obfuscate the binary code by rewriting the area defined as the replacement target. Therefore, the replacement unit 201 does not need to insert data into the binary code, nor does it need to replace one instruction or series of instructions in the binary code with one instruction or series of instructions having a longer byte length. .. As described above, in the obfuscation by the obfuscation device 200, the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur. According to the obfuscation device 200, the load of obfuscation of the binary code is relatively light in this respect.
 また、難読化装置200によれば、実行される命令は書き換えず、パディングなど実行されない部分を書き換えることで、バイナリコードの難読化を行える。難読化装置200ではこの点で、難読化されたバイナリコードをコンピュータが実行する際に、コンピュータが不測の動作をする可能性を低減できる。 Further, according to the obfuscation device 200, the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction. In this respect, the obfuscation device 200 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
 図8は、実施形態に係る難読化方法における処理の手順の例を示す図である。図8に示す難読化処理方法は、バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える工程を含む。 FIG. 8 is a diagram showing an example of a processing procedure in the obfuscation method according to the embodiment. The obfuscation processing method shown in FIG. 8 uses a loss function indicating the detectability of a predetermined pattern in the binary code, and sets the value of the part to be replaced in the binary code under the condition that the detectability is a predetermined condition. The step of replacing with a smaller value is included below.
 この難読化方法では、アドレスの書き換えの必要なしに、バイナリコードを難読化できる。特に、図8の難読化方法では、置換対象に定められた領域を書き換えることでバイナリコードを難読化できる。したがって、図8の難読化方法では、バイナリコードにデータを挿入する必要は無く、また、バイナリコード中の1つの命令または一連の命令を、バイト長がより長い1つの命令または一連の命令に置き換える必要もない。このように、図8の難読化方法では、バイナリコードのうち値を置き換えた部分以降の部分のアドレスの繰り下がりは生じず、したがって、アドレスの書き換えの必要は生じない。図8の難読化方法によれば、この点で、バイナリコードの難読化の負荷が比較的軽くて済む。 With this obfuscation method, the binary code can be obfuscated without the need to rewrite the address. In particular, in the obfuscation method of FIG. 8, the binary code can be obfuscated by rewriting the area defined as the replacement target. Therefore, in the obfuscation method of FIG. 8, it is not necessary to insert data into the binary code, and one instruction or a series of instructions in the binary code is replaced with one instruction or a series of instructions having a longer byte length. There is no need. As described above, in the obfuscation method of FIG. 8, the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur. According to the obfuscation method of FIG. 8, the load of obfuscation of the binary code is relatively light in this respect.
 また、図8の難読化方法によれば、実行される命令は書き換えず、パディングなど実行されない部分を書き換えることで、バイナリコードの難読化を行える。図8の難読化方法ではこの点で、難読化されたバイナリコードをコンピュータが実行する際に、コンピュータが不測の動作をする可能性を低減できる。 Further, according to the obfuscation method of FIG. 8, the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction. In this respect, the obfuscation method of FIG. 8 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.
 図9は、少なくとも1つの実施形態に係るコンピュータの構成を示す概略ブロック図である。
 図9に示す構成で、コンピュータ700は、CPU710と、主記憶装置720と、補助記憶装置730と、インタフェース740とを備える。
 上記の難読化装置100および難読化装置200のうち何れか1つ以上が、コンピュータ700に実装されてもよい。その場合、上述した各処理部の動作は、プログラムの形式で補助記憶装置730に記憶されている。CPU710は、プログラムを補助記憶装置730から読み出して主記憶装置720に展開し、当該プログラムに従って上記処理を実行する。また、CPU710は、プログラムに従って、上述した各記憶部に対応する記憶領域を主記憶装置720に確保する。各装置と他の装置との通信は、インタフェース740が通信機能を有し、CPU710の制御に従って通信を行うことで実行される。
FIG. 9 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
With the configuration shown in FIG. 9, the computer 700 includes a CPU 710, a main storage device 720, an auxiliary storage device 730, and an interface 740.
Any one or more of the obfuscation device 100 and the obfuscation device 200 may be mounted on the computer 700. In that case, the operation of each of the above-mentioned processing units is stored in the auxiliary storage device 730 in the form of a program. The CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program. Further, the CPU 710 secures a storage area corresponding to each of the above-mentioned storage units in the main storage device 720 according to the program. Communication between each device and other devices is executed by having the interface 740 have a communication function and performing communication according to the control of the CPU 710.
 難読化装置100がコンピュータ700に実装される場合、制御部190およびその各部の動作は、プログラムの形式で補助記憶装置730に記憶されている。CPU710は、プログラムを補助記憶装置730から読み出して主記憶装置720に展開し、当該プログラムに従って上記処理を実行する。
 また、CPU710は、プログラムに従って、記憶部180に対応する記憶領域を主記憶装置720に確保する。取得部110の機能および出力部120の機能は、インタフェース740が例えば通信機能などのデータ入出力機能を有し、CPU710の制御に従って通信を行うことで実行される。
When the obfuscation device 100 is mounted on the computer 700, the operations of the control unit 190 and each unit thereof are stored in the auxiliary storage device 730 in the form of a program. The CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program.
Further, the CPU 710 secures a storage area corresponding to the storage unit 180 in the main storage device 720 according to the program. The function of the acquisition unit 110 and the function of the output unit 120 are executed by the interface 740 having a data input / output function such as a communication function and performing communication according to the control of the CPU 710.
 難読化装置200がコンピュータ700に実装される場合、置換部201の動作は、プログラムの形式で補助記憶装置730に記憶されている。CPU710は、プログラムを補助記憶装置730から読み出して主記憶装置720に展開し、当該プログラムに従って上記処理を実行する。 When the obfuscation device 200 is mounted on the computer 700, the operation of the replacement unit 201 is stored in the auxiliary storage device 730 in the form of a program. The CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program.
 なお、難読化装置100および難読化装置200の全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各部の処理を行ってもよい。ここでいう「コンピュータシステム」とは、OS(オペレーティングシステム)や周辺機器等のハードウェアを含む。
 「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ROM(Read Only Memory)、CD-ROM(Compact Disc Read Only Memory)等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。
A program for realizing all or a part of the functions of the obfuscation device 100 and the obfuscation device 200 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system. , Each part may be processed by executing. The term "computer system" as used herein includes hardware such as an OS (operating system) and peripheral devices.
"Computer readable recording medium" includes flexible disks, optomagnetic disks, portable media such as ROM (Read Only Memory) and CD-ROM (Compact Disc Read Only Memory), and hard disks built into computer systems. A storage device. Further, the above-mentioned program may be a program for realizing a part of the above-mentioned functions, and may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.
 以上、本発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計変更等も含まれる。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings, but the specific configuration is not limited to this embodiment, and design changes and the like within a range not deviating from the gist of the present invention are also included.
 本発明の実施形態は、難読化装置、難読化方法および記録媒体に適用してもよい。 The embodiment of the present invention may be applied to an obfuscation device, an obfuscation method, and a recording medium.
 100、200 難読化装置
 110 取得部
 120 出力部
 180 記憶部
 190 制御部
 191 分割部
 192 置換対象検出部
 193 損失関数取得部
 194、201 置換部
100, 200 Obfuscation device 110 Acquisition unit 120 Output unit 180 Storage unit 190 Control unit 191 Division unit 192 Replacement target detection unit 193 Loss function acquisition unit 194, 201 Replacement unit

Claims (6)

  1.  バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える置換部
     を備える難読化装置。
    A replacement part that replaces the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition by using a loss function indicating the detectability of a predetermined pattern in the binary code. Obfuscation device equipped.
  2.  前記バイナリコード中における前記所定のパターンの位置を推定するニューラルネットワークに、前記バイナリコードをワンホットベクトル化して入力した場合の推定結果と、前記バイナリコード中における前記所定のパターンの位置の正解ラベルとの相関性を示す前記損失関数を取得する損失関数取得部
     をさらに備え、
     前記置換部は、前記損失関数をワンホットベクトルのビット毎に偏微分した傾きのうち、損失関数が最も大きくなる傾きを示すワンホットベクトルをバイナリデータ化した値を前記置換対象の部分に入力する、
     請求項1に記載の難読化装置。
    An estimation result when the binary code is converted into a one-hot vector and input to a neural network that estimates the position of the predetermined pattern in the binary code, and a correct label of the position of the predetermined pattern in the binary code. A loss function acquisition unit that acquires the loss function indicating the correlation between the two is further provided.
    The replacement unit inputs a value obtained by converting the one-hot vector showing the slope with the largest loss function into binary data among the slopes obtained by partially differentiating the loss function for each bit of the one-hot vector into the replacement target portion. ,
    The obfuscation device according to claim 1.
  3.  前記置換部は、前記バイナリコードに含まれる一連の命令かつ1つ以上の命令のうち、一連の命令全体のバイト長がジャンプ命令のバイト長よりも長い一連の命令を空き領域に移し、移転元の部分の値を、移転先へのジャンプ命令、および、前記検出可能性を前記所定の条件以下に小さくするための値に置き換える、
     請求項1または請求項2に記載の難読化装置。
    Among the series of instructions and one or more instructions included in the binary code, the replacement unit transfers a series of instructions in which the byte length of the entire series of instructions is longer than the byte length of the jump instruction to the free area, and the transfer source Replace the value of the part with a jump instruction to the transfer destination and a value for reducing the detectability to the predetermined condition or less.
    The obfuscation device according to claim 1 or 2.
  4.  前記バイナリコードを所定長のサブシーケンスに分割する分割部
     をさらに備え、
     前記置換部は、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える処理をサブシーケンス毎に行う、
     請求項1から3の何れか一項に記載の難読化装置。
    A division part for dividing the binary code into subsequences of a predetermined length is further provided.
    The replacement unit performs a process of replacing the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition for each subsequence.
    The obfuscation device according to any one of claims 1 to 3.
  5.  バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える工程
     を含む難読化方法。
    It includes a step of replacing the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition by using a loss function indicating the detectability of a predetermined pattern in the binary code. Obfuscation method.
  6.  コンピュータに、
     バイナリコード中における所定のパターンの検出可能性を示す損失関数を用いて、前記バイナリコードのうち置換対象の部分の値を、前記検出可能性が所定の条件以下に小さくなる値に置き換える工程
     を実行させるためのプログラムを記録した記録媒体。
    On the computer
    A step of replacing the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition is executed by using a loss function indicating the detectability of a predetermined pattern in the binary code. A recording medium on which a program for recording is recorded.
PCT/JP2019/044620 2019-11-14 2019-11-14 Obfuscation device, obfuscation method, and recording medium WO2021095188A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/044620 WO2021095188A1 (en) 2019-11-14 2019-11-14 Obfuscation device, obfuscation method, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/044620 WO2021095188A1 (en) 2019-11-14 2019-11-14 Obfuscation device, obfuscation method, and recording medium

Publications (1)

Publication Number Publication Date
WO2021095188A1 true WO2021095188A1 (en) 2021-05-20

Family

ID=75912985

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/044620 WO2021095188A1 (en) 2019-11-14 2019-11-14 Obfuscation device, obfuscation method, and recording medium

Country Status (1)

Country Link
WO (1) WO2021095188A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090049425A1 (en) * 2007-08-14 2009-02-19 Aladdin Knowledge Systems Ltd. Code Obfuscation By Reference Linking
WO2015043408A1 (en) * 2013-09-27 2015-04-02 Tencent Technology (Shenzhen) Company Limited Method of protecting binary file from being decompiled and device thereof
JP2017504910A (en) * 2014-01-31 2017-02-09 サイランス・インコーポレイテッドCylance Inc. API call graph generation from static disassembly

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090049425A1 (en) * 2007-08-14 2009-02-19 Aladdin Knowledge Systems Ltd. Code Obfuscation By Reference Linking
WO2015043408A1 (en) * 2013-09-27 2015-04-02 Tencent Technology (Shenzhen) Company Limited Method of protecting binary file from being decompiled and device thereof
JP2017504910A (en) * 2014-01-31 2017-02-09 サイランス・インコーポレイテッドCylance Inc. API call graph generation from static disassembly

Similar Documents

Publication Publication Date Title
US6113650A (en) Compiler for optimization in generating instruction sequence and compiling method
CN103765402B (en) What use mixed code signature tracing program calls context
US7856629B2 (en) Compiler apparatus
CN101523348B (en) Method and apparatus for handling dynamically linked function calls with respect to program code conversion
US8146070B2 (en) Method and apparatus for optimizing software program using inter-procedural strength reduction
JP4766540B2 (en) Method and apparatus for performing verification of program code conversion
US9542169B2 (en) Generating SIMD code from code statements that include non-isomorphic code statements
US10423397B2 (en) Systems and/or methods for type inference from machine code
JP3424520B2 (en) Program conversion device and debug device
Di Federico et al. A jump-target identification method for multi-architecture static binary translation
JP5966509B2 (en) Program, code generation method, and information processing apparatus
US8296750B2 (en) Optimization of a target program
CN115017516A (en) Fuzzy test method based on symbolic execution
US8458679B2 (en) May-constant propagation
Mendis et al. Revec: program rejuvenation through revectorization
CN112639774B (en) Compiler device with masking function
JP2000132404A (en) Instruction sequence optimizing device
JP4905480B2 (en) Program obfuscation program and program obfuscation device
US20080184213A1 (en) Compiler device, method, program and recording medium
WO2021095188A1 (en) Obfuscation device, obfuscation method, and recording medium
JP4719415B2 (en) Information processing system and code generation method
US9135027B1 (en) Code generation and execution for dynamic programming languages
CN102483701A (en) Program generation device, program production method, and program
US11593080B1 (en) Eliminating dead stores
CN113296833B (en) Identification method and device for legal instructions in binary file

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19952401

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19952401

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP