WO2021095188A1

WO2021095188A1 - Obfuscation device, obfuscation method, and recording medium

Info

Publication number: WO2021095188A1
Application number: PCT/JP2019/044620
Authority: WO
Inventors: 拓磨天田; センペイリュウ
Original assignee: 日本電気株式会社
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2021-05-20

Abstract

This obfuscation device comprises a replacement unit which uses a loss function indicating the detectability of a predetermined pattern in binary codes and replaces a value of a part to be replaced in the binary codes with a value at which the detectability is a predetermined condition or less.

Description

Obfuscation device, obfuscation method and recording medium

The present invention relates to an obfuscation device, an obfuscation method, and a recording medium.

In relation to software obfuscation, Patent Document 1 describes that a conditional branch having different truths depending on a parameter is created by using an encryption method in which the probability of successful decryption is variable according to the parameter. Are listed. From the first code according to the first parameter, a conditional branch that can always be regarded as true is created, and in this case, a dummy process is assigned to the NO side of the conditional branch. From the second code according to the second parameter, a conditional branch that can always be regarded as false is created, and in this case, a dummy process is assigned to the YES side of the conditional branch. From the third code based on the third parameter, a conditional branch whose truth is indefinite is created, and in this case, equivalent processing with different descriptions is assigned to both of the conditional branches.

Japanese Patent Application Laid-Open No. 2018-106260

When obfuscating a binary code, if the address of the subsequent code is moved down by inserting a dummy code etc., the address such as the address of the data reference destination or the jump destination address of the jump instruction will be rewritten and the load of the obfuscation process will be increased. Becomes larger. It is preferable that the binary code can be obfuscated without the need to rewrite the address.

An example of an object of the present invention is to provide an obfuscation device, an obfuscation method, and a recording medium capable of solving the above problems.

According to the first aspect of the present invention, the obfuscation device can detect the value of the part to be replaced in the binary code by using a loss function indicating the detectability of a predetermined pattern in the binary code. It is provided with a replacement part that replaces the property with a value that becomes smaller than a predetermined condition.

According to the second aspect of the present invention, the obfuscation method uses a loss function indicating the detectability of a predetermined pattern in the binary code to detect the value of the part to be replaced in the binary code. It includes a step of replacing the possibility with a value that becomes smaller than a predetermined condition.

According to the third aspect of the present invention, the recording medium uses a loss function indicating the detectability of a predetermined pattern in the binary code on a computer to obtain the value of the portion to be replaced in the binary code. It is a recording medium on which a program for executing a step of replacing the detectability with a value smaller than a predetermined condition is recorded.

According to the above-mentioned obfuscation device, obfuscation method, and recording medium, the binary code can be obfuscated without the need to rewrite the address.

It is a schematic block diagram which shows the example of the functional structure of the obfuscation device which concerns on embodiment. It is a figure which shows the example of the input / output of the target neural network which the obfuscation apparatus which concerns on embodiment reduces the pattern detection accuracy. It is a figure which shows the example of padding which the obfuscation apparatus which concerns on embodiment is the object of rewriting of a value. It is a figure which shows the example of the instruction of the movement target by the obfuscation device which concerns on embodiment. It is a figure which shows the example of the movement of the instruction by the obfuscation device which concerns on embodiment. It is a flowchart which shows the procedure of the process which obfuscate the binary code by the obfuscation apparatus which concerns on embodiment. It is a figure which shows the example of the structure of the obfuscation apparatus which concerns on embodiment. It is a figure which shows the example of the processing procedure in the obfuscation method which concerns on embodiment. It is a schematic block diagram which shows the structure of the computer which concerns on at least one Embodiment.

Hereinafter, embodiments of the present invention will be described, but the following embodiments do not limit the inventions claimed. Also, not all combinations of features described in the embodiments are essential to the means of solving the invention.
FIG. 1 is a schematic block diagram showing an example of the functional configuration of the obfuscation device according to the embodiment. With the configuration shown in FIG. 1, the obfuscation device 100 includes an acquisition unit 110, an output unit 120, a storage unit 180, and a control unit 190. The control unit 190 includes a division unit 191, a replacement target detection unit 192, a loss function acquisition unit 193, and a replacement unit 194.

The obfuscation device 100 obfuscates the binary code. In particular, the obfuscation device 100 performs a process of reducing the accuracy of estimating a specific pattern in the binary code by using the neural network, such as estimating the start address of the function by the neural network (NN).

The binary code here is an executable program. It is called binary code because it is understood that the program is represented by a binary number (bit string). Binary code can be obtained, for example, by compiling the source code (a program written in a high-level language) into binary code. Alternatively, the binary code can be obtained by assembling the assemble code (Assemble Code, a program described in assembly language).

Here, when analyzing the binary code by reverse engineering, it is useful to recognize the function by detecting the start address and end address of the function included in the binary code. One of the methods for detecting the start address of a function included in the binary code is a method using a neural network such as deep learning. In particular, a method using a recurrent neural network (RNN) has shown relatively high performance.

From the perspective of a binary code provider who wants to prevent unauthorized analysis of binary code, it is preferable that the detection accuracy of the start address of a function or the like using a neural network can be reduced. Therefore, the obfuscation device 100 performs a process of reducing the accuracy with which the neural network detects a specific pattern (for example, the start address of the function) in the binary code. This can make the analysis of binary code relatively difficult.

FIG. 2 is a diagram showing an example of input / output of a target neural network in which the obfuscation device 100 reduces the pattern detection accuracy.
The neural network 900 shown in FIG. 2 accepts binary code input in byte units. “Byte 0”, “byte 1”, “byte 2”, ... In FIG. 2 indicate the value of each byte in order from the beginning of the binary code.
The neural network 900 accepts, for example, input of data in which a binary code is converted into a one-hot vector for each byte. The one-hot vectorization of 1-byte data is shown by Eq. (1).

The "x" on the left side of the arrow in equation (1) indicates one byte of data in the binary code. “{0,1}” indicates one bit that takes a value of 0 or 1. “{0,1} ⁸ ” indicates 8-bit data. When 1-byte data is expressed in decimal, it takes an integer value of 0 to 255.

The "x" with an arrow on the right side of the arrow in equation (1) indicates a one-hot vector. The arrow above the "x" is attached to clearly indicate that it is a vector. The “x” with an arrow is also expressed as a vector x (for example, a one-hot vector x) or simply x.

As shown in equation (1), the one-hot vector x is represented by a vertical vector of 256 bits from b ₀ _{to b 255.} In the one-hot vector x, _{the value of any one of the 256 bits from b 0} to b ₂₅₅ is "1", and the value of the other 255 bits is "0". ..

In one-hot vectorization of one-byte data, the byte value is shown in one-hot representation. That is, 1-byte value i of data (i is an integer of 0 ≦ i ≦ 255) if it is, the bit b _i value is "1", the other bit value is "0".

The neural network 900 that receives the input of the binary code performs binary classification for each byte of the binary code whether or not the byte is the head of the function, and outputs the classification result. For example, the neural network 900 outputs a value "1" for a byte estimated to be the beginning of a function, and outputs a value "0" for a byte estimated not to be the beginning of a function. Each of "R0", "R1", "R2", ... In FIG. 2 takes a value of "1" or "0" depending on the estimation result of whether or not it is the beginning of the function.

The position at the beginning of the function is also called the start address of the function. The position at the end of the function is also called the end address of the function.
The estimation target by the neural network 900 is not limited to the position at the beginning of the function, and can be the position of various patterns that can be detected in the obfuscated binary data.

The obfuscation device 100 partially rewrites the binary code so that the accuracy of the output of the neural network 900 becomes lower. At that time, the instruction itself to be executed should not be rewritten so that the computer executing the binary code does not behave unexpectedly. Partial rewriting of binary code is also referred to as replacing the value of that part.

Here, as a method of rewriting the binary code, a method of rewriting a series of instructions included in the binary code into a series of instructions that perform equivalent processing can be considered. However, with this method, the computer executing the binary code may behave unexpectedly due to a bug in the rewritten series of instructions. Also, if the byte length of the rewritten series of instructions is longer than the byte length of the series of instructions before rewriting, the address of the part after the rewrite target part is carried down, the jump destination address of the jump instruction, etc. It becomes necessary to rewrite the address, which increases the load of obfuscation processing.
Also, as a method of rewriting the binary code, a method of inserting unexecuted data between the functions can be considered. However, in this method, when the address of the portion after the data insertion portion is carried down, it becomes necessary to rewrite the address such as the jump destination address of the jump instruction, which increases the load of the obfuscation process.

On the other hand, the obfuscation device 100 obfuscates the binary code to be obfuscated by rewriting the value of the portion that is not referenced. References here include references for execution. Therefore, the part that is not referred to here is a part that is neither executed nor referred to.
One of the parts where the obfuscation device 100 rewrites the value is padding between the functions. The padding here is a non-referenced part provided for alignment, for example, the start address of a function is set to the start address of a block every 8 bytes.

FIG. 3 is a diagram showing an example of padding that the obfuscation device 100 targets for rewriting the value. In the example of FIG. 3, the function func1 starting from line number 1 ends at line number 6, and processing is transferred to another by the jump instruction of line number 6. Therefore, the nop of line number 7 (line L11) located between the end of function func1 (line number 6) and the beginning of function func2 (line number 9) and the nop of line number 8 (line L12) Neither is executed. These lines L11 and L12 correspond to the padding example.

The obfuscation device 100 rewrites, for example, the value of line L11 and the value of line L12. The obfuscation device 100 rewrites the existing padding value instead of inserting new data, so that the address of the code is not changed after that. In this respect, the obfuscation device 100 does not need to rewrite the address, and the load of the obfuscation process can be relatively small.

Further, the obfuscation device 100 generates a portion to be rewritten of the value by moving the instruction to the free area.
FIG. 4 is a diagram showing an example of a command to be moved by the obfuscation device 100.
In the example of FIG. 4, the move instruction on line L21 has a length of 5 bytes in binary code. The jump instruction is 4 bytes in binary code, and the move instruction has a longer byte length than the jump instruction. Further, the line L21 is not referred to by another part, and the process does not jump from the other part to the line L21. Further, since the move instruction on line L21 does not include an address as an argument, the move instruction can be moved as it is without being rewritten.
Therefore, the obfuscation device 100 selects the binary code of line L21 as the movement target.

As described above, the obfuscation device 100 has a longer byte length than the jump instruction, is neither a reference target from another part nor a jump destination from another part, and is not rewritten. Select the command that can be moved to as the movement target.
Although FIG. 4 shows an example in which the obfuscation device 100 selects one instruction as the movement target, the obfuscation device 100 selects a series of a plurality of instructions as the movement target. May be good. In this case, the total byte length of a series of plurality of instructions is longer than the byte length of the jump instruction, and any of the plurality of instructions can be referred to by another part and the jump destination from the other part. It suffices if it can be moved without rewriting.

FIG. 5 is a diagram showing an example of instruction movement by the obfuscation device 100.
FIG. 5 shows an example in which the obfuscation device 100 moves the row L21 of FIG. 4 to the free area row L35. The free area referred to here is an unused area of the memory area of the storage unit 180 that can be used for the binary code. The free area before rewriting is an area that is not referenced like padding.

The obfuscation device 100 moves the move instruction in line L21 of FIG. 4 to line L35 as it is without rewriting. Further, the obfuscation device 100 provides a jump instruction to the line L35, which is the destination of the move instruction, in the first line L32 of the lines L32 and L33, which is the source of the move instruction. Further, the obfuscation device 100 provides a jump instruction to the line L34 immediately after the movement source of the move instruction in the line L36 immediately after the line L35 to which the move instruction is moved.

As a result, the computer that executes the code of FIG. 5 has an add instruction (line L31), a jump instruction (line L32), a move instruction (line L35), a jump instruction (line L36), and an add instruction (line L34) in this order. Execute. The computer executes the same instructions in the same order as in the code of FIG. 4 according to the code of FIG. 5, except that the processing jumps at each of the jump instruction of line L32 and the jump instruction of line L36.

Further, in the example of FIG. 5, the row L33 is a region that is not referred to. The obfuscation device 100 rewrites the value of line L33 for obfuscation of the binary code.
In this way, the obfuscation device 100 moves the command satisfying the above-mentioned conditions to the free area, maintains the execution order of the commands by the jump instruction, and rewrites the value for obfuscation of the binary code. Can be obtained.

The acquisition unit 110 acquires the obfuscated binary code.
In addition, the acquisition unit 110 acquires information regarding the binary code to be obfuscated. Specifically, the acquisition unit 110 acquires information capable of grasping an instruction in the obfuscated binary code and an area that is not referred to when the obfuscated binary code is executed. In the following, a case where the acquisition unit 110 acquires the assemble code corresponding to the obfuscated binary code will be described as an example.

The assemble code corresponding to the binary code referred to here is an assemble code indicating the information of the binary code. The assemble code corresponding to the binary code may be the assemble code of the source (source to be assembled) of the binary code. Alternatively, the assemble code corresponding to the binary code may be an assemble code obtained by disassembling the binary code.

The method by which the acquisition unit 110 acquires the binary code and the assemble code is not limited to a specific method. For example, the acquisition unit 110 may have a communication function and receive the binary code and the assembly code from another device. Alternatively, the acquisition unit 110 may compile the source code into assembly code and further assemble it into binary code.

The output unit 120 outputs the obfuscated binary code.
The method by which the output unit 120 outputs the binary code is not limited to a specific method. For example, the output unit 120 may have a communication function and transmit a binary code to another device. Alternatively, the output unit 120 may write the binary code to an external memory (a storage device that can be attached to and detached from the obfuscation device 100).

The storage unit 180 stores various data. The storage unit 180 is configured by using the storage device included in the obfuscation device 100.
The control unit 190 controls each unit of the obfuscation device 100 to execute various processes. The function of the control unit 190 is executed by the CPU (Central Processing Unit) included in the obfuscation device 100 reading a program from the storage unit 180 and executing the program.

The division unit 191 divides the obfuscated binary code into subsequences of a predetermined length. The obfuscation device 100 can obfuscate an arbitrary length binary code by performing obfuscation processing for each subsequence. The length of the subsequence that the division unit 191 divides the obfuscated binary code into is not limited to a specific length.
If the remaining length of the binary code is shorter than the length of the subsequence, for example, the divider 191 adds data such as a bit string with a value of "0" to the rest of the binary code to make it the length of the subsequence. It may be.

The replacement target detection unit 192 detects the replacement target portion of the obfuscated binary code.
For example, the replacement target detection unit 192 detects the padding illustrated in rows L11 and L12 of FIG. 3 as the replacement target portion.
Further, the replacement target detection unit 192 detects the instruction to be moved, which is exemplified in the line L21 of FIG. The replacement target detection unit 192 may detect one instruction or a series of instructions.

The loss function acquisition unit 193 acquires the loss function. The loss function referred to here is the estimation result of the position of a predetermined pattern in the binary code when the binary code is converted into a one-hot vector and input to the neural network, and the correct label of the position of the predetermined pattern in the binary code. It is a function showing the correlation of.
For example, as the loss function, a function for calculating an error obtained by subtracting the value of the correct label from the estimation result of the neural network may be used.

Here, if the byte length of the subsequence X obtained by dividing the binary code by the dividing unit 191 is s + 1 bytes, and each byte of the subsequence X is represented _{by one hot vector x 0} , ..., X _{s in order from the first byte.} , Subsequence X is expressed as in Eq. (2).

When the neural network is represented by the function f, the prediction result of the neural network with respect to the subsequence X is represented by f (X).
f (X) may output a vector whose elements take a value of either "0" or "1", such as the outputs R0, R1, ... Of the neural network 900 of FIG. ..

Alternatively, f (X) may output a value indicating the magnitude of the possibility that the byte is the beginning of the function for each byte of the binary code input to the neural network 900. In this case, "R0", "R1", "R2", ... In FIG. 2 correspond to the magnitude of the possibility that each byte is the beginning of the function in order from the beginning of the binary code. .. f (X) may output the probability as a value indicating the magnitude of the possibility, but the present invention is not limited to this.

Further, the correct label at the position of the predetermined pattern in the subsequence X is referred to as Y.
The loss function Loss is expressed as Loss (f (X), Y) using f (X) and Y.
It is assumed that the loss function acquisition unit 193 is known about the function f and can calculate the loss function Loss (f (X), Y).

The replacement unit 194 rewrites the replacement target portion detected by the replacement target detection unit 192.
When the replacement target detection unit 192 detects padding as the replacement target portion, the replacement unit 194 rewrites the padding. That is, the replacement unit 194 updates the padding byte value.
The byte whose value is updated by the replacement unit 194 is also referred to as a junk byte.

When the replacement target detection unit 192 detects a movement target command as a replacement target portion, the replacement unit 194 refers to FIGS. 4 and 5 with respect to the movement target command detected by the replacement target detection unit 192. Move the instructions and place the jump instructions as described. As a result, the replacement target detection unit 192 provides the area to be rewritten illustrated in line L33 of FIG. Then, the replacement target detection unit 192 rewrites the area to be rewritten.

The replacement unit 194 uses the loss function acquired by the loss function acquisition unit 193 to set the value of the replacement target part of the obfuscated binary code to be less than or equal to the predetermined condition that the neural network can detect a predetermined pattern. Replace with a smaller value.
For example, the replacement unit 194 may perform a full solution search for all the bytes to be replaced and rewrite the value to the value that minimizes the detectability indicated by the loss function. Specifically, the replacement unit 194 calculates the loss function value for all combinations of 256 values of each byte for all the bytes to be replaced. Then, the replacement unit 194 adopts the combination having the smallest loss function value, and replaces the value of each byte to be replaced with the value in the adopted combination.

With the full solution search method, the solution with the lowest estimation accuracy by the neural network can be obtained. On the other hand, in the method using the full solution search, the amount of calculation is the power of the number of bytes to be replaced. Therefore, if the number of bytes to be replaced is large, it may not be possible to obtain a solution within a realistic time. is there.

Therefore, the replacement unit 194 may determine the value to be written in the region to be rewritten based on the slope obtained by partially differentiating the loss function for each bit of the one-hot vector. Specifically, the value obtained by converting the one-hot vector showing the slope at which the loss function is the largest into binary data is input (written) to the part to be replaced.

The one-hot vector representation of the obfuscated subsequence X ^{* in} this case is given by Eq. (3).

The arrowed x _i ^* (i is an integer of 0 ≦ i ≦ s) indicates a one-hot representation of the bytes of the ^{subsequence X * after obfuscation, and is expressed as in equation (4).}

As described above, b i _(i is an integer of 0 ≦ i ≦ 255) indicates a bit one-hot vector indicating 1 byte. Therefore, ∂Loss / ∂b ₀ , ···, ∂Loss / ∂b ₀ indicates the partial differential of the loss function Loss by each bit of the one-hot vector.
Here, argmax indicates one-hot vectorization in which the value of the element having the maximum value among the elements of the vector is set to "1" and the value of the other element is set to "0".
An example of one-hot vectorization by argmax is shown in equation (5).

On the left side of the arrow in the equation (5), a vector of three elements whose element values are "1", "0", and "5" is shown. Of these three elements, the value of "5" is the largest.
In the hot vectorization by argmax, the value of the element "5" having the largest value is set to "1", and the values of the other elements are set to "0". When the vector on the left side of the arrow in equation (5) is converted into a one-hot vector by argmax, the vector of three elements whose element values are "0", "0", and "1" is obtained as shown on the right side of the arrow. Be done.

In the above equation (4), when the one-hot vector x _i corresponds to the byte to be replaced, that is, when the one-hot vector x _i is the one-hot notation of the byte to be replaced, the value of the loss function Loss is set. It is replaced with the one-hot vector obtained by applying argmax to the value differentiated by each element of the one-hot vector.
On the other hand, if the one-hot vector x _i does not correspond to the bytes to be replaced, i.e., one-hot when the vector x _i is not a one-hot representation of bytes to be replaced, as it is one-hot one-hot vector x _i before obfuscation Let the vector x _i ^* be.

In this way, the replacement unit 194 rewrites the byte to be replaced based on the gradient of the loss function Loss to obfuscate it, so that the estimation accuracy by the neural network can be improved, for example, by increasing the error of the estimation result of the neural network. It is expected that it can be made smaller.
In addition, the replacement unit 194 can determine the value to be written in the byte to be replaced by comparing 256 pieces of data. In this respect, the replacement unit 194 can determine the value of the byte to be replaced and rewrite it in a relatively short time even when the number of bytes to be replaced is large.

Next, the operation of the obfuscation device 100 will be described with reference to FIG.
FIG. 6 is a flowchart showing a procedure of the process of obfuscating the binary code by the obfuscation device 100.
In the process of FIG. 6, the acquisition unit 110 acquires the obfuscated binary code and the assembly code corresponding to the binary code (step S11).

Next, the obfuscation device 100 starts a loop L101 in which the division unit 191 performs processing for each subsequence cut out from the obfuscated binary code (step S12).
Then, the division unit 191 cuts out the subsequence to be processed from the obfuscated binary code (step S13). When the dividing unit 191 has already cut out the subsequence from the obfuscated binary code, the subsequence to be processed is cut out from the remaining part of the obfuscated binary code from which the subsequence has been cut out.

Next, the replacement target detection unit 192 detects the portion to be rewritten in the subsequence cut out by the division unit 191 (sequence S14). For example, the replacement target detection unit 192 detects the above-mentioned padding and mobile target commands.
Then, the replacement unit 194 patches the junk bite to the portion to be rewritten detected by the replacement target detection unit 192 (step S15). For example, the replacement unit 194 determines and writes a value to be written in the portion to be rewritten by the method of searching for all solutions described above or the method using the gradient of the loss function.

Next, the obfuscation device 100 performs termination processing of the loop L101 (step S16). Specifically, the obfuscation device 100 determines whether or not all the binary codes to be obfuscated are cut out as subsequences and processed.
If it is determined that there is a portion that has not been cut out yet, the obfuscation device 100 continues to process the loop L101 for the portion that has not been cut out.
On the other hand, when it is determined that all the obfuscated binary codes have been cut out as subsequences and processed, the obfuscation device 100 ends the loop L101.

When the obfuscation device 100 ends the loop L101 in step S16, the output unit 120 outputs the obfuscated binary code (step S17). The obfuscated binary code is obtained by combining the processed subsequences in loop L101 in the same order as the original binary code.
After step S17, the obfuscation device 100 ends the process of FIG.

As described above, the replacement unit 194 uses a loss function indicating the detectability of a predetermined pattern in the binary code to set the value of the part to be replaced in the binary code to be less than or equal to the predetermined condition. Replace with a smaller value.
As a result, the obfuscation device 100 can obfuscate the binary code without the need to rewrite the address. In particular, the replacement unit 194 can obfuscate the binary code by rewriting the area defined as the replacement target. Therefore, the replacement unit 194 does not need to insert data into the binary code, nor does it need to replace one instruction or series of instructions in the binary code with one instruction or series of instructions having a longer byte length. .. As described above, in the obfuscation by the obfuscation device 100, the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur. According to the obfuscation device 100, the load of obfuscation of the binary code is relatively light in this respect.

Further, according to the obfuscation device 100, the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction. In this respect, the obfuscation device 100 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.

Further, the loss function acquisition unit 193 inputs the estimation result when the binary code is converted into a one-hot vector and input to the neural network for estimating the position of the predetermined pattern in the binary code, and the position of the predetermined pattern in the binary code. Get the loss function that shows the correlation with the correct answer label of. The replacement unit 194 inputs a binary data value of the one-hot vector indicating the slope at which the loss function is the largest among the slopes obtained by partially differentiating the loss function for each bit of the one-hot vector into the replacement target portion.

The replacement unit 194 rewrites the byte to be replaced based on the gradient (partial differential) of the loss function to obfuscate it, thereby reducing the estimation accuracy by the neural network, for example, increasing the error in the estimation result of the neural network. It is expected that it can be done.
In addition, the replacement unit 194 can determine the value to be written in the byte to be replaced by comparing 256 pieces of data. In this respect, the replacement unit 194 can determine the value of the byte to be replaced and rewrite it in a relatively short time even when the number of bytes to be replaced is large.

Further, the replacement unit 194 transfers a series of instructions included in the binary code and one or more instructions in which the byte length of the entire series of instructions is longer than the byte length of the jump instruction to the free area. The value of the original part is replaced with a jump instruction to the transfer destination and a value for reducing the detectability below a predetermined condition.
As a result, in the obfuscation device 100, the portion to be replaced can be provided without having to rewrite the instruction to be executed. In this respect, the obfuscation device 100 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.

Further, the division unit 191 divides the binary code into subsequences having a predetermined length. Then, the replacement unit 194 performs a process of replacing the value of the portion of the binary code to be replaced with a value whose detectability becomes smaller than a predetermined condition for each subsequence.
The obfuscation device 100 can obfuscate an arbitrary length binary code by performing obfuscation processing for each subsequence.

FIG. 7 is a diagram showing an example of the configuration of the obfuscation device according to the embodiment. The obfuscation device 200 shown in FIG. 7 includes a replacement unit 201.
In such a configuration, the replacement unit 201 uses a loss function indicating the detectability of a predetermined pattern in the binary code to reduce the value of the part to be replaced in the binary code to be less than or equal to the predetermined condition. Replace with a value that becomes.

As a result, the obfuscation device 200 can obfuscate the binary code without the need to rewrite the address. In particular, the replacement unit 201 can obfuscate the binary code by rewriting the area defined as the replacement target. Therefore, the replacement unit 201 does not need to insert data into the binary code, nor does it need to replace one instruction or series of instructions in the binary code with one instruction or series of instructions having a longer byte length. .. As described above, in the obfuscation by the obfuscation device 200, the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur. According to the obfuscation device 200, the load of obfuscation of the binary code is relatively light in this respect.

Further, according to the obfuscation device 200, the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction. In this respect, the obfuscation device 200 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.

FIG. 8 is a diagram showing an example of a processing procedure in the obfuscation method according to the embodiment. The obfuscation processing method shown in FIG. 8 uses a loss function indicating the detectability of a predetermined pattern in the binary code, and sets the value of the part to be replaced in the binary code under the condition that the detectability is a predetermined condition. The step of replacing with a smaller value is included below.

With this obfuscation method, the binary code can be obfuscated without the need to rewrite the address. In particular, in the obfuscation method of FIG. 8, the binary code can be obfuscated by rewriting the area defined as the replacement target. Therefore, in the obfuscation method of FIG. 8, it is not necessary to insert data into the binary code, and one instruction or a series of instructions in the binary code is replaced with one instruction or a series of instructions having a longer byte length. There is no need. As described above, in the obfuscation method of FIG. 8, the address of the portion of the binary code after the portion where the value is replaced does not carry down, and therefore the need for rewriting the address does not occur. According to the obfuscation method of FIG. 8, the load of obfuscation of the binary code is relatively light in this respect.

Further, according to the obfuscation method of FIG. 8, the binary code can be obfuscated by rewriting the non-executed part such as padding without rewriting the executed instruction. In this respect, the obfuscation method of FIG. 8 can reduce the possibility that the computer behaves unexpectedly when the computer executes the obfuscated binary code.

FIG. 9 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
With the configuration shown in FIG. 9, the computer 700 includes a CPU 710, a main storage device 720, an auxiliary storage device 730, and an interface 740.
Any one or more of the obfuscation device 100 and the obfuscation device 200 may be mounted on the computer 700. In that case, the operation of each of the above-mentioned processing units is stored in the auxiliary storage device 730 in the form of a program. The CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program. Further, the CPU 710 secures a storage area corresponding to each of the above-mentioned storage units in the main storage device 720 according to the program. Communication between each device and other devices is executed by having the interface 740 have a communication function and performing communication according to the control of the CPU 710.

When the obfuscation device 100 is mounted on the computer 700, the operations of the control unit 190 and each unit thereof are stored in the auxiliary storage device 730 in the form of a program. The CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program.
Further, the CPU 710 secures a storage area corresponding to the storage unit 180 in the main storage device 720 according to the program. The function of the acquisition unit 110 and the function of the output unit 120 are executed by the interface 740 having a data input / output function such as a communication function and performing communication according to the control of the CPU 710.

When the obfuscation device 200 is mounted on the computer 700, the operation of the replacement unit 201 is stored in the auxiliary storage device 730 in the form of a program. The CPU 710 reads the program from the auxiliary storage device 730, expands it to the main storage device 720, and executes the above processing according to the program.

A program for realizing all or a part of the functions of the obfuscation device 100 and the obfuscation device 200 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system. , Each part may be processed by executing. The term "computer system" as used herein includes hardware such as an OS (operating system) and peripheral devices.
"Computer readable recording medium" includes flexible disks, optomagnetic disks, portable media such as ROM (Read Only Memory) and CD-ROM (Compact Disc Read Only Memory), and hard disks built into computer systems. A storage device. Further, the above-mentioned program may be a program for realizing a part of the above-mentioned functions, and may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.

As described above, the embodiment of the present invention has been described in detail with reference to the drawings, but the specific configuration is not limited to this embodiment, and design changes and the like within a range not deviating from the gist of the present invention are also included.

The embodiment of the present invention may be applied to an obfuscation device, an obfuscation method, and a recording medium.

100, 200 Obfuscation device 110 Acquisition unit 120 Output unit 180 Storage unit 190 Control unit 191 Division unit 192 Replacement target detection unit 193 Loss

function acquisition unit

194, 201 Replacement unit

Claims

A replacement part that replaces the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition by using a loss function indicating the detectability of a predetermined pattern in the binary code. Obfuscation device equipped.
An estimation result when the binary code is converted into a one-hot vector and input to a neural network that estimates the position of the predetermined pattern in the binary code, and a correct label of the position of the predetermined pattern in the binary code. A loss function acquisition unit that acquires the loss function indicating the correlation between the two is further provided.
The replacement unit inputs a value obtained by converting the one-hot vector showing the slope with the largest loss function into binary data among the slopes obtained by partially differentiating the loss function for each bit of the one-hot vector into the replacement target portion. ,
The obfuscation device according to claim 1.
Among the series of instructions and one or more instructions included in the binary code, the replacement unit transfers a series of instructions in which the byte length of the entire series of instructions is longer than the byte length of the jump instruction to the free area, and the transfer source Replace the value of the part with a jump instruction to the transfer destination and a value for reducing the detectability to the predetermined condition or less.
The obfuscation device according to claim 1 or 2.
A division part for dividing the binary code into subsequences of a predetermined length is further provided.
The replacement unit performs a process of replacing the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition for each subsequence.
The obfuscation device according to any one of claims 1 to 3.
It includes a step of replacing the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition by using a loss function indicating the detectability of a predetermined pattern in the binary code. Obfuscation method.
On the computer
A step of replacing the value of the part to be replaced in the binary code with a value whose detectability becomes smaller than a predetermined condition is executed by using a loss function indicating the detectability of a predetermined pattern in the binary code. A recording medium on which a program for recording is recorded.