CN114816435A

CN114816435A - Software development method based on reverse technology

Info

Publication number: CN114816435A
Application number: CN202210253180.9A
Authority: CN
Inventors: 平洋; 陈�光; 张伟华; 梁东晨; 白小燕; 钟远; 杨华
Original assignee: Research Institute of War of PLA Academy of Military Science
Current assignee: Research Institute of War of PLA Academy of Military Science
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-07-29

Abstract

The invention provides a software development method based on a reverse technology, which comprises the following steps of S1, performing decompiling on target software to obtain a source code; s2, performing function analysis on the target software, and determining the corresponding relation between each code block in the source code and the function; s3, deconstruction analysis is carried out on the target software to obtain software information of the target software including a system deployment environment, a system architecture, a scheduling mode, an interface specification, a communication protocol, a reference mode and a data storage mode; and S4, based on the software information reconstruction system, filling code blocks with corresponding relations with functions into the reconstruction system. According to the scheme, the software part is reversed according to the actual function of the software, the rest part can be expanded as required, the workload of reverse development can be reduced, and the operability of software reversal can be realized to the greatest extent.

Description

Software development method based on reverse technology

Technical Field

The invention belongs to the technical field of software development, and particularly relates to a software development method based on a reverse technology.

Background

Reverse engineering, also known as software reverse engineering, refers to the process of reversely disassembling and analyzing the structure, flow, algorithm, code, etc. of software by using various computer technologies such as disassembling, system analysis, program understanding, etc. from an executable program system to derive the source code, design principle, structure, algorithm, processing procedure, operation method, related documents, etc. of a software product. It can be simply understood that a new system profile is constructed by identifying and analyzing the source code of the computer software. The method carries out basic analysis on an original system of computer software, then identifies the components of the system software, and constructs a brand-new and high-level software system by clarifying the relationship of each component of the software. Generally, the whole process of performing reverse analysis on software is referred to as software reverse engineering, and the technologies adopted in the process are referred to as software reverse engineering technologies.

The software reverse technology can be used for exploring the vulnerability of the current software protection, examining the protection of the current software result, or debugging, repairing the function and the like of the software under the condition of a production environment passive code, and can also be used for carrying out function expansion on the software and developing new software based on the existing software. The existing software reverse all reverses the existing software, but the reverse all details is a huge time-consuming project, the efficiency is very low, and the problem of poor operability exists.

Disclosure of Invention

The invention aims to provide a software development method based on a reverse technology aiming at the problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

a software development method based on reverse technology comprises the following steps:

s1, decompiling target software to obtain a source code;

s2, performing function analysis on the target software, and determining the corresponding relation between each code block in the source code and the function;

s3, deconstruction analysis is carried out on the target software to obtain software information of the target software including a system deployment environment, a system architecture, a scheduling mode, an interface specification, a communication protocol, a reference mode and a data storage mode;

and S4, based on the software information reconstruction system, filling code blocks with corresponding relations with functions into the reconstruction system.

In the above software development method based on the reverse technique, step S1 specifically includes:

s11, reading a target code of the target software to a memory;

s12, analyzing the target code to separate an instruction code and data;

s13, disassembling the target code through a disassembling tool to obtain an assembly file;

s14, decompiling the assembly file through a decompiling tool to obtain a source code.

In the above software development method based on the reverse technology, step S11 specifically includes:

A1. reading a plurality of bytes from the target binary format file and storing the bytes in the Content object;

A2. storing the Content object into a Vector container;

A3. steps A1 and A2 are repeated until the end of the file.

In the software development method based on the reverse technology, step S12 specifically includes:

B1. tracing instruction control flow, traversing and identifying each instruction;

B2. the code portions reachable by the instruction stream are identified as instruction codes, and the remaining portions are identified as data.

In the above software development method based on the reverse technology, step S13 specifically includes:

C1. sequentially taking out the objects from the Vector container, and judging whether the objects are instruction codes or data according to the separation result of the instruction codes and the data;

C2. if the object is an instruction code, disassembling the instruction code into an assembly instruction form through a disassembling tool; if so, the data is translated into values of the data, either directly or through a disassembly tool.

In the above software development method based on the reverse technique, the steps between S13 and S14 further include:

D1. normalizing the assembly instruction code into an intermediate code;

D2. extracting a library function, and identifying a system library function and a user-defined function;

D3. recovering key information of the user-defined function, including name, parameter number, return value and type;

in step S14, the decompilation tool decompilates the system library function and the user-defined function, respectively.

In the above-described software development method based on the reverse technique, in step S2, the function of the target software is determined according to the operation manual, the help document, and by dynamically operating the target software.

In the software development method based on the reverse technology, in step S2, the key code is searched and extracted by dynamically debugging the source code, and the corresponding relationship between the key code and the function is marked according to the determined function.

In the software development method based on the reverse technique, in step S3, the target software is deconstructed and analyzed by a static analysis method and/or a dynamic analysis method.

In the software development method based on the reverse technology, step S4 is preceded by converting the source code/code block into the target development language;

in step S4, the reconfiguration system is an open system with an open interface for user development, improvement and verification.

The invention has the advantages that:

1. the software part is reversed according to the actual function of the software, and the rest part can be expanded as required, so that the workload of reverse development can be reduced, and the operability of software reverse can be realized to the maximum extent;

2. firstly, separating instruction codes and data in object codes, and directly performing targeted processing on the instruction codes and the data when a disassembling tool is used for disassembling, so that the interference of the data on disassembling work of the instruction codes is avoided, and the disassembling efficiency is improved;

3. the Vector container is used for storing the Content object, so that Content extraction and marking are facilitated, and instruction control flow is tracked subsequently;

4. tracking the instruction control flow by tracking the PC value can ensure that each instruction is traversed, thereby ensuring the thorough separation of the instruction code and the data and ensuring the separation effect;

5. before the assembly file is processed, the user-defined function is identified, the user-defined function and the system library function are separated, and key information of the user-defined function is restored, so that the subsequent restoration work of the source code level code is facilitated.

Drawings

FIG. 1 is a flow chart of a software development method based on a reverse technique according to the present invention;

FIG. 2 is a flowchart of a method for obtaining source code in the software development method based on the reverse technique according to the present invention;

FIG. 3 is a flowchart illustrating a method for reading a target code from a memory according to a software development method based on a reverse technique according to the present invention;

FIG. 4 is a flowchart of a method for tracking instruction control flow in the software development method based on the reverse technology;

FIG. 5 is a flowchart of disassembling process in the software development method based on reverse technology;

FIG. 6 is a flowchart illustrating the specific processing of data and instruction codes during disassembly in the software development method based on the reverse technique according to the present invention;

FIG. 7 is a diagram illustrating the location layout of functions in a memory.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the present embodiment discloses a software development method based on a reverse technology, which includes the following steps:

s1, decompiling target software to obtain a source code;

Specifically, in step S2, the functions of the target software are determined according to the operation manual, the help document and the manner of dynamically operating the target software, and the key codes corresponding to all the functions or the main functions or the functions selected by the user are searched and extracted by dynamically debugging the source codes, and the corresponding relationship between the key codes and the functions is marked according to the determined functions.

Through the step S2, the functions are positioned in the code blocks, and then the corresponding relations between the code blocks and the functions are marked, so that the user can select the key functions according to the requirements.

Further, in step S3, deconstruction analysis is performed on the target software by static analysis and/or dynamic analysis. Specifically, the tool for static analysis may be a tool such as c32asm, and the tool for dynamic analysis may be a tool such as Ollydbg. The deconstruction analysis of the target software may be performed simultaneously during the decompilation of the target software.

Furthermore, the source code obtained by decompiling the target software may be C #, JAVA and other languages, some languages are convenient to develop, some languages are inconvenient to develop, some engineers are used to one language, and some engineers are used to another language, so that the scheme further converts the source code/code block into the target development language according to the user requirements after obtaining the source code according to the target code. And the system reconstructed in step S4 is an open system with an open interface for the user to develop, perfect, check, and the like, such as interface development, logic expansion, function expansion on demand, and the like.

The specific method for converting the source code/code block into the target development language is as follows:

s1, dividing a source code/code block into a plain text replacement part and a function replacement part according to the characteristics of character strings;

s2, replacing character strings of the plain text replacement part with character strings corresponding to the target development language according to the data base; and replacing the character strings of the function replacing part with character strings with corresponding meanings of the target development language according to the data base and the meanings of the data base.

The data base here is a set of correspondence between two language mechanisms constructed in advance for a specific code block language and a target translation code language.

Specifically, as shown in fig. 2, step S1 specifically includes:

s11, reading a target code of the target software to a memory;

s12, analyzing the target code to separate an instruction code and data;

As shown in fig. 3, a Vector container of a Content object is defined, and step S11 specifically includes:

A1. firstly, defining a Content object, reading a plurality of bytes from a target binary format file and storing the bytes in the Content object; the number of bytes read at a time is determined by those skilled in the art on a case-by-case basis, e.g., 4 bytes at a time for disassembly of a 32-bit instruction;

A2. storing the Content object into a Vector container;

A3. steps A1 and A2 are repeated until the end of the file.

Step S12 specifically includes:

Specifically, as shown in fig. 4, instruction control flow is tracked in step B1 by:

B11. setting the PC value to 0;

B12. for example, when the instructions of the 32-bit operating system are decompiled, since each instruction is stored as 4 bytes in the 32-bit operating system, at this time, the specific Content object refers to a Content object with a subscript of PC/4, which indicates that the PC address is divided by 4 and then rounded, and represents the ending address of each 4-byte instruction, that is, the starting address of the instruction;

B13. marking the fetched Content object as an instruction, and marking the instruction as accessed;

B14. judging whether the instruction identified in the step B13 is a program ending instruction, if so, executing a step B15, otherwise, executing a step B16;

B15. continuously judging whether the display list is empty, if so, ending tracking, otherwise, taking out an Elem element from the display list, if not, representing an Elem instruction for calling a stack in a chain, and the instructions still map a Content object in a Vector container, so that the process is repeated in the separation process until the display list has no Elem element to realize complete separation effect;

judging whether the instruction at the addr address in the Elem element is accessed, if not, recovering the current field information including the PC value and returning to the step B12, namely, pushing the data at the addr address, if so, judging whether the instruction at the addr address in the next Elem element is accessed, and ending the tracking till all the Elem elements are traversed

B16. And further judging whether the instruction identified in the B13 is a branch instruction, if so, updating the PC value, the display table and the return table according to the specific branch instruction, otherwise, increasing the PC by itself and returning to the step B12. The step is a loop traversal process, and is used for performing stack pushing operation on the return table and the display table of the identified instructions conforming to the instruction transfer representation. The return table is used for recording a return address when the program is called; when a double branch instruction is encountered, the display address and the field (the value of each register of the program) are filled in the display table.

Specifically, in step B16, the specific steps of updating the PC value, the display table, and the return table are as follows:

if it is an unconditional branch instruction (B instruction, MOV PC, 0x16), the address of the instruction is filled in the field table, its explicit address is filled in the field table, and the explicit address is used as the current PC address;

if the instruction is an unconditional branch instruction subprogram calling instruction (BL instruction), filling a segment table of an address where the instruction is located, filling a return address into a return address table, filling a segment table of an explicit address, and taking the explicit address as a current PC address;

if the instruction is a return instruction (MOV PC, LR) in the unconditional branch instruction, finding a return address in a return address table according to a 'last-in first-out' principle, filling a segment table of the address where the instruction is located, returning the address to the segment table, and taking the return address as the current PC address;

in the case of a binary point instruction (BEQ, MOVEQ PC, 0x16, etc.), the explicit address is filled into the explicit address table (the register value at that time is also saved), and then the implicit address is taken as the current PC address.

The segment table is used for filling the branch addresses of all branch instructions except the conditional branch into the table, and comprises the instruction address and the steering address. The code segments of a plurality of segments can be obtained through the segment table, so that the codes are clearer and are convenient for subsequent work such as disassembling and the like.

Further, as shown in fig. 5 and 6, step S13 specifically includes:

C2. if the object is an instruction code, disassembling the instruction code into an assembly instruction form; if so, the data is translated into a value of the data. The codes are separated firstly, so that the disassembler can disassemble the instruction codes in a targeted manner respectively, data are directly translated, and the disassembling efficiency is improved.

Further, the steps S13 and S14 further include:

D1. normalizing assembly instruction code to intermediate code

(Low2level IntermediatedLanguage, LIL), and constructing various symbol tables in the conversion process for later work;

D2. extracting a library function through a dynamic debugging intermediate code and identifying a system library function and a user-defined function;

D3. and recovering key information of the user-defined function, including name, parameter number, return value and type.

The intermediate code is an internal representation of a source program, does not depend on the structure of a target machine, and the normalization of assembly instructions into the intermediate code and the subsequent work are beneficial to the development and the migration (robustness) of a compiler program and can help a user to optimize the code more conveniently. The dynamic debugging means that a program is enabled to run, and dynamic debugging tools such as Ollydbg and the like can be adopted.

When the same disassembling tool is adopted, the function recognition program of the steps D1-D3 can be embedded into the disassembling tool or can be parallel to the disassembling tool, the disassembling tool assembles the target code to obtain an assembly file, then the assembly file is output to the function recognition program for function recognition, the function recognition program returns the recognition result to the disassembling tool, and then the disassembling tool continues to carry out work such as control flow analysis, data type analysis and the like on the system library function and the user-defined function respectively to complete the compiling work so as to obtain a disassembling result at the source code level. The user-defined function and the system library function are respectively opened, and the disassembling tool is used for disassembling work such as control flow analysis, data type analysis and the like after the information of the user-defined function is recovered, so that the influence of user-defined data on the disassembling work can be avoided, the disassembling efficiency is improved, and the disassembling error rate is reduced.

Specifically, in step D2, the user-defined function is identified by:

E1. preparing a plurality of calling programs which only have one library function calling statement and only have different calling programs in the aspect of calling parameters of the function;

E2. the calling programs prepared in step E1 are executed, and the function with the valid operation instruction fixed is determined as the user-defined function. The effective operation instruction refers to an operation code of a user-defined library function instruction, the operation code of the user-defined library function instruction is fixed and unchangeable, the problem of address relocation cannot occur in the process of compiling and linking, and the influence of compilers of different versions and compiling optimization cannot be caused. Therefore, the effective operation instruction of the same library function is unchanged in different calling programs, so that the library function customized by the user can be separated from the system library function through the steps.

Further, the user-defined functions determined in step D2 are address checked, the function of the highest address is selected, and the function and all functions of lower addresses are determined as user-defined functions. As shown in fig. 7, the user-defined library functions have their codes stored continuously in the same order as their defined order in the source program, and the user-defined library function codes are located at low addresses and the system library function codes are located at high addresses, so that the user-defined library functions can be effectively separated from the system library functions by this method.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Although terms like object software, code block, object code, etc. are used more often herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.

Claims

1. A software development method based on reverse technology is characterized by comprising the following steps:

s1, decompiling target software to obtain a source code;

2. The software development method based on the reverse technology as claimed in claim 1, wherein step S1 specifically includes:

s11, reading a target code of the target software to a memory;

s12, analyzing the target code to separate an instruction code and data;

3. The software development method based on the reverse technology according to claim 2, wherein step S11 specifically includes:

A2. storing the Content object into a Vector container;

A3. steps A1 and A2 are repeated until the end of the file.

4. The software development method based on the reverse technology according to claim 3, wherein step S12 is specifically:

5. The software development method based on the reverse technology as claimed in claim 4, wherein step S13 is specifically:

6. The reverse technology-based software development method according to claim 5, further comprising, between steps S13 and S14:

D1. normalizing the assembly instruction code into an intermediate code;

7. A method for software development based on reverse technology according to any of claims 1-6, characterized in that in step S2, the function of the target software is determined according to the operation manual, help document and by means of dynamic operation of the target software.

8. A method for developing software based on reverse technology according to claim 7, wherein in step S2, the key code is searched and extracted by dynamically debugging the source code, and the corresponding relationship between the key code and the function is marked according to the determined function.

9. A reverse-technology-based software development method according to claim 8, wherein in step S3, the target software is deconstructed and analyzed by static analysis and/or dynamic analysis.

10. The reverse technology-based software development method according to claim 9, wherein step S4 is preceded by converting the source code/code blocks into a target development language;