CN115391754A

CN115391754A - Source code obfuscation method and device

Info

Publication number: CN115391754A
Application number: CN202210944144.7A
Authority: CN
Inventors: 余亮; 陈秀; 李琪
Original assignee: Dongfeng Commercial Vehicle Co Ltd
Current assignee: Dongfeng Commercial Vehicle Co Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-11-25

Abstract

The application relates to a source code obfuscation method and a device, relating to the technical field of vehicle code development, wherein the source code obfuscation method comprises the following steps: acquiring a Token in a source code in an original state, and acquiring a corresponding Token list; generating a binary coding list with corresponding length based on the length of the Token list; sequencing the Token in the Token list based on a preset sequence to obtain a Token sequencing corresponding table; and replacing the Token in the source code with a corresponding coding character string in the binary coding list according to the Token sequence in the Token sequence corresponding table. According to the method and the device, the Token in the source code is collected, and the Token is subjected to coding replacement according to the preset sequence, so that the operation feasibility of the source code after confusion is guaranteed on the premise of achieving the purpose of source code confusion, and the use requirement is met.

Description

Source code obfuscation method and device

Technical Field

The application relates to the technical field of vehicle code development, in particular to a source code obfuscation method and a source code obfuscation device.

Background

In the current technical field of vehicle code development, source codes are often required to be obfuscated according to technical requirements so as to realize the confidential work of the source codes.

The source code obfuscation technology at the present stage has some technical defects:

in the process of code confusion, the source code is damaged with a certain probability, so that the source code cannot normally run, although the confidentiality requirement is met, the reliability is low, and the source code cannot execute the original normal function with a certain probability.

Therefore, a source code obfuscation technique is now provided to remedy the shortcomings of the current source code obfuscation techniques.

Disclosure of Invention

The application provides a source code obfuscation method and device, wherein the Token in the source code is collected, and the Token is subjected to coding replacement according to a preset sequence, so that the operation feasibility of the obfuscated source code is guaranteed on the premise of achieving the purpose of source code obfuscation, and the use requirement is met.

In a first aspect, the present application provides a source code obfuscation method, comprising:

acquiring a Token in a source code in an original state, and acquiring a corresponding Token list;

generating a binary coding list with corresponding length based on the length of the Token list;

sequencing the tokens in the Token list based on a preset sequence to obtain a Token sequencing corresponding table;

and replacing the Token in the source code with the corresponding coding character string in the binary coding list according to the Token sequence in the Token sequence corresponding table.

It should be noted that the original state of the source code refers to a state of the source code before the source code obfuscation processing is performed on the source code, that is, the source code is not obfuscated.

Token, which means a Token (temporary) in computer authentication and a Token in lexical analysis, is generally used as an invitation/login system;

the Token more popular point can be called as a secret number, before some data are transmitted, the secret number needs to be checked, and different data operations are authorized for different secret numbers;

for example, in the USB1.1 protocol, 4 types of packets are defined: token package, data package, handshake package and special package;

the continuous data exchange between the host and the USB device can be divided into three stages, the host sends Token packets in the first stage, different Token packets with different contents (different numbers) can tell the device to do different work, the data packets are sent in the second stage, and the device returns a handshake packet in the third stage.

In the embodiment of the application, the Token in the source code is collected, and the Token is coded and replaced according to the preset sequence, so that the operation feasibility of the source code after confusion is guaranteed on the premise of achieving the purpose of source code confusion, and the use requirement is met;

based on the technical scheme, the obfuscated source code can be correctly understood and operated by a machine, so that a human reader cannot easily understand the obfuscated source code, and the obfuscated source code has recoverability.

Further, before the step of acquiring Token in the source code in the original state and obtaining the corresponding Token list, the method further includes the following steps:

and acquiring reserved words in the source code and acquiring a corresponding reserved word set.

Further, the method includes the following steps of obtaining Token in the source code in the original state, and obtaining a corresponding Token list:

and screening and reading the Token in the source code based on the reserved word set, and obtaining a corresponding Token list.

Specifically, the generating a binary coding list with a corresponding length based on the length of the Token list includes the following steps:

based on the length of the Token list, generating a binary coding list with corresponding length according to a Gray code coding mode; wherein the content of the first and second substances,

the binary coding list is a gray code list.

Specifically, the replacing the Token in the source code with the corresponding code string in the binary code list according to the Token sorting in the Token sorting correspondence table includes the following steps:

correspondingly sorting the binary code list according to the Token sorting in the Token sorting correspondence table to obtain a correspondingly sorted binary code list;

and replacing the Token in the source code with a corresponding coding character string in the corresponding ordered binary coding list based on the corresponding sequence.

and adding a preset identifier in front of the coding character string in the binary coding list according to the Token sequence in the Token sequence corresponding table, and replacing the corresponding Token in the source code.

Further, after replacing the Token in the source code with the corresponding encoding character string in the binary encoding list according to the Token sequence in the Token sequence corresponding table, the method includes the following steps:

and recovering the coded character strings in the source code based on the binary coded list, and recovering the source code to be in an original state.

In a second aspect, the present application provides a source code obfuscation apparatus, comprising:

the information acquisition module is used for acquiring a Token in a source code in an original state and acquiring a corresponding Token list;

the preprocessing module is used for generating a binary coding list with corresponding length based on the length of the Token list;

the sorting module is used for sorting the tokens in the Token list based on a preset sequence to obtain a Token sorting correspondence table;

and the coding replacement module is used for replacing the Token in the source code with the corresponding coding character string in the binary coding list according to the Token sequencing in the Token sequencing corresponding table.

Further, the information acquisition module is further configured to acquire reserved words in the source code and acquire a corresponding reserved word set;

the information acquisition module is also used for screening and reading the Token in the source code based on the reserved word set and acquiring the corresponding Token list.

Further, the information obtaining module is further configured to filter and read the Token in the source code based on the reserved word set, and obtain the corresponding Token list.

Further, the preprocessing module is further configured to generate a binary coding list with a corresponding length according to a gray code coding mode based on the length of the Token list; wherein the content of the first and second substances,

the binary coding list is a gray code list.

Further, the code replacement module is further configured to sort the binary code list correspondingly according to the Token sort in the Token sort correspondence table, and obtain a binary code list after the corresponding sorting;

and replacing the Token in the source code with the corresponding code character string in the corresponding ordered binary code list based on the corresponding sequence.

Further, the sorting module is further configured to add a preset identifier in front of the encoded character string in the binary encoding list according to Token sorting in the Token sorting correspondence table, and replace the corresponding Token in the source code.

Further, the source code obfuscating apparatus further includes:

and the code recovery module is used for recovering the coded character string in the source code based on the binary code list and recovering the source code to be in an original state.

The beneficial effect that technical scheme that this application provided brought includes:

according to the method and the device, the Token in the source code is collected, and the Token is subjected to encoding replacement according to the preset sequence, so that the operation feasibility of the source code after confusion is guaranteed on the premise of achieving the purpose of source code confusion, and the use requirements are met.

Drawings

Interpretation of terms:

USB: universal Serial Bus.

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of the steps of a source code obfuscation method provided in an embodiment of the present application;

fig. 2 is a block diagram of a source code obfuscating apparatus provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application provides a source code obfuscation method and device, wherein Token in a source code is collected, and the Token is replaced by coding according to a preset sequence, so that the operation feasibility of the obfuscated source code is guaranteed on the premise of achieving the purpose of source code obfuscation, and the use requirement is met.

In order to achieve the technical effects, the general idea of the application is as follows:

a source code obfuscation method, comprising the steps of:

s1, acquiring a Token in a source code in an original state, and acquiring a corresponding Token list;

s2, generating a binary coding list with corresponding length based on the length of the Token list;

s3, sorting the tokens in the Token list based on a preset sequence to obtain a Token sorting correspondence table;

s4, replacing the Token in the source code with a corresponding coding character string in the binary coding list according to the Token sequencing in the Token sequencing corresponding table.

In a first aspect, referring to fig. 1, an embodiment of the present application provides a source code obfuscation method, where the method includes the following steps:

the continuous data exchange between the host and the USB device can be divided into three stages, wherein the host sends Token packets in the first stage, different Token packets with different contents (different numbers) can tell the device to do different work, the data packets are sent in the second stage, and the device returns a handshake packet in the third stage.

Further, before the Token in the source code in the original state is obtained and the corresponding Token list is obtained, the method further includes the following steps:

Specifically, the method includes the following steps of obtaining Token in the source code in the original state, and obtaining a corresponding Token list:

based on the length of the Token list, generating a binary coding list with corresponding length according to a Gray code coding mode; wherein, the first and the second end of the pipe are connected with each other,

the binary coding list is a gray code list.

correspondingly sequencing the binary code list according to the Token sequencing in the Token sequencing corresponding table to obtain a correspondingly sequenced binary code list;

Specifically, the replacing the Token in the source code with the corresponding code string in the binary code list according to the Token sequence in the Token sequence correspondence table includes the following steps:

and adding a preset identifier in front of the coded character string in the binary coding list according to the Token sequencing in the Token sequencing corresponding table, and replacing the corresponding Token in the source code.

Further, after replacing the Token in the source code with the corresponding encoding character string in the binary encoding list according to the Token sorting in the Token sorting correspondence table, the method includes the following steps:

It should be noted that, based on the technical solution of the embodiment of the present application, the specific implementation includes the following procedures:

the first step, obtaining the reserved words of the language used by the source code, including the set of the operational characters;

secondly, identifying and reading all Token from codes and comments in the text of the source code, and removing all reserved words from the codes and comments to obtain a user Token list;

thirdly, acquiring the length of the user token list, and generating a binary code list with corresponding length, wherein the binary code can be Gray code; wherein the content of the first and second substances,

binary codes have the characteristic of being difficult to distinguish, while, for example, gray codes, where adjacent codes differ by only one bit, are more difficult for a human reader to distinguish.

In the encoding of a group of numbers, if any two adjacent codes have only one binary number, the encoding is called Gray Code (Gray Code), and in addition, the maximum number and the minimum number are also only one bit different, namely the codes are connected end to end, so the encoding is also called cyclic Code or reflective Code;

in digital systems, it is often required that the codes change in a certain order. For example, counting is incremented by a natural number, and if an 8421 code is used, four bits all change when the number 0111 changes to 1000, while in an actual circuit, a 4-bit change is unlikely to occur absolutely simultaneously, and other codes (1100, 1111, etc.) may appear momentarily in the count;

in certain situations, errors in the state of the circuit or in the input can result, which can be avoided by using gray codes, which have a variety of coding formats.

It should be noted that gray code belongs to reliable coding and is a coding mode with minimized error, although natural binary code can be directly converted into analog signal by digital/analog converter, in some cases, for example, when the binary code is converted from decimal 3 to 4, every bit of binary code is changed, which can make digital circuit generate large peak current pulse;

gray code does not have the disadvantage that only one bit changes when adjacent bits are converted;

greatly reducing the confusion of logic from one state to the next;

since only one bit is different between two adjacent code groups of the codes, in the conversion of the angle displacement amount of the direction to the digital amount, when the angle displacement amount of the direction slightly changes and the digital amount possibly changes, the gray code only changes one bit, which is more reliable than the case of simultaneously changing two or more bits by other codes, and the possibility of error is reduced.

Fourthly, sorting the user token list according to a certain order, wherein the sorting basis can be a high-order priority dictionary order, a low-order priority dictionary order or an order of first appearance of the tokens in the source code;

furthermore, the method is aligned with the gray code list one by one, because the lengths of the gray code list and the user Token list are consistent, the gray code list and the user Token list can be in one-to-one correspondence, and the Token ordering correspondence table after the ordering is finished is recorded.

Fifthly, according to the Token sorting correspondence table, in the original text of the source code, including a code line and an annotation line, replacing all the user tokens with corresponding code character strings;

in the step, a prefix such as a certain letter beginning is added in front of a coded character string, namely a preset identifier such as 'str', for example 'S', so that a compiler is prevented from mistaking a code as a number;

and taking the replaced text as the obfuscated source code.

And sixthly, restoring the source code to the original state based on the binary coding list and the Token sequencing corresponding table.

Specifically, the following source code is taken as an example:

the source code is: def carFleat (self, target: int, position: list [ int ], speed: list [ int ]) - > int.

The user token set is: carreet, target, position, speed, the rest are reserved words of the programming language.

The corresponding four-digit gray code is 00, 01, 11, 10, and the letter prefix s, s00, s01, s11, s10 is added.

Suppose the user token ordering takes the appearance order, carFlet ≧ s00, target → s01, position → s11, speed → s10.

After obfuscation, the source code becomes:

def s00 (self, s01: int, s11: list [ int ], s10: list [ int ]) - > int; wherein the content of the first and second substances,

the ordering rule is that four Gray codes are sequentially corresponding to 00, 01, 11 and 10 according to the appearance order of the user token;

the original Token is then replaced in the source code by the corresponding gray code, plus the prefix s character.

To sum up, the technical scheme of the embodiment of the application has the following technical advantages:

first, each Token of the content of the non-linguistic reserved word is represented as a numeric code of equal length, which masks the original length of the Token, and the numeric (binary) codes reduce the degree of recognition of each other.

Second, the possible correspondence between Token and number code is n! The number of n is the number of Token, so that the possibility of guessing is reduced;

moreover, the frequency analysis of the classic method for cracking the round robin password is not effective to the invention because different codes have different frequency distribution rules.

Third, both the code lines of the source code and the annotation lines are obfuscated, so that the obfuscated code loses human readability.

Fourthly, the obfuscated source code does not affect operation and program performance except that it is not understood by human.

Fifth, the obfuscated source code may be decoded backwards, as needed.

In a second aspect, referring to fig. 2, an embodiment of the present application provides a source code obfuscating apparatus based on the technology of the source code obfuscating method mentioned in the first aspect, where the apparatus includes:

a preprocessing module, configured to generate a binary code list with a corresponding length based on the length of the Token list;

the Token more popular point can be called as a secret number, before some data are transmitted, the secret number needs to be checked, and different secret numbers are authorized for different data operation;

for example, class 4 packets are defined in the USB1.1 protocol: token packet, data packet, handshake packet and special packet;

Further, the information obtaining module is further configured to obtain reserved words in the source code, and obtain a corresponding reserved word set;

the information acquisition module is further configured to filter and read Token in the source code based on the reserved word set, and obtain the corresponding Token list.

Further, the preprocessing module is further configured to generate a binary coding list with a corresponding length according to a gray code coding mode based on the length of the Token list; wherein, the first and the second end of the pipe are connected with each other,

the binary coding list is a gray code list.

Further, the code replacement module is further configured to perform corresponding sorting on the binary code list according to the Token sorting in the Token sorting correspondence table, so as to obtain a binary code list after the corresponding sorting;

Further, the sorting module is further configured to sort according to the Token in the Token sorting correspondence table, add a preset identifier in front of the encoded character string in the binary encoding list, and replace the corresponding Token in the source code.

Further, the source code obfuscating apparatus further includes:

and the code recovery module is used for recovering the code character string in the source code based on the binary code list and recovering the source code to an original state.

secondly, identifying and reading all Token from codes and annotations in the text of the source codes, and removing all reserved words to obtain a user Token list;

thirdly, acquiring the length of the user token list, and generating a binary code list with corresponding length, wherein the binary code can be Gray code; wherein, the first and the second end of the pipe are connected with each other,

binary codes have the characteristic of being difficult to distinguish, while adjacent codes, such as gray codes, differ by only one bit and are more difficult for a human reader to distinguish.

In the encoding of a group of numbers, if any two adjacent codes have only one binary number, the encoding is called Gray Code (Gray Code), and in addition, because the maximum number and the minimum number have only one bit, namely the codes are connected end to end, the codes are also called cyclic codes or reflective codes;

gray code does not have this disadvantage, and only one bit changes when adjacent bits are converted;

greatly reducing the logic confusion from one state to the next;

Fourthly, sorting the user token lists according to a certain sequence, wherein the sorting basis can be a high-order priority lexicographic sequence, a low-order priority lexicographic sequence or a sequence of tokens appearing in the source codes for the first time;

Fifthly, replacing all the user tokens with corresponding code character strings in the original text of the source code according to the Token sorting correspondence table, wherein the original text of the source code comprises a code line and an annotation line;

and taking the replaced text as the obfuscated source code.

Specifically, the following source code is taken as an example:

The corresponding four-digit gray code is 00, 01, 11, 10, and is added with letter prefixes s, s00, s01, s11, s10.

Suppose the user token ordering takes the appearance order, carFleat- > s00, target → s01, position → s11, speed → s10.

After obfuscation, the source code becomes:

def s00 (self, s01: int, s11: list [ int ], s10: list [ int ]) - > int; wherein, the first and the second end of the pipe are connected with each other,

Second, the possible correspondence between Token and number code is n! N is the number of Token, so that the possibility of guessing solution is reduced;

Fourthly, the obfuscated source code does not affect the operation and the program performance except that it is not understood by human.

Fifth, the obfuscated source code may be decoded backwards, as needed.

It is noted that, in the present application, relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above are merely exemplary embodiments of the present application and are intended to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of source code obfuscation, the method comprising the steps of:

sorting the Token in the Token list based on a preset sequence to obtain a Token sorting corresponding table;

2. The source code obfuscation method as in claim 1, wherein before obtaining Token in the source code in an original state and obtaining a corresponding Token list, the method further comprises:

3. A source code obfuscation method as claimed in claim 2, wherein the method further comprises the steps of obtaining Token in the source code in an original state and obtaining a corresponding Token list:

4. The source code obfuscation method as in claim 1, wherein generating a binary code list of a corresponding length based on a length of the Token list comprises:

based on the length of the Token list, generating a binary coding list with a corresponding length according to a Gray code coding mode; wherein the content of the first and second substances,

the binary coding list is a gray code list.

5. The method for source code obfuscation according to claim 1, wherein the step of replacing the Token in the source code with the corresponding encoding string in the binary encoding list according to the Token ordering in the Token ordering correspondence table includes the steps of:

6. The source code obfuscation method as claimed in claim 1, wherein the replacing the Token in the source code into the corresponding encoding string in the binary encoding list according to the Token ordering in the Token ordering correspondence table includes the following steps:

and adding a preset identifier in front of the coded character string in the binary coding list according to the Token sequence in the Token sequence corresponding table, and replacing the corresponding Token in the source code.

7. The source code obfuscation method as claimed in claim 1, wherein after replacing the Token in the source code with the corresponding encoding string in the binary encoding list according to the Token ordering in the Token ordering correspondence table, the method includes the following steps:

8. A source code obfuscation apparatus, the apparatus comprising:

9. The source code obfuscation device of claim 8, wherein the information obtaining module is further configured to obtain a reserved word in the source code and obtain a corresponding set of reserved words;

10. The source code obfuscation device of claim 9, wherein: