CN113946346B - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113946346B
CN113946346B CN202111166915.6A CN202111166915A CN113946346B CN 113946346 B CN113946346 B CN 113946346B CN 202111166915 A CN202111166915 A CN 202111166915A CN 113946346 B CN113946346 B CN 113946346B
Authority
CN
China
Prior art keywords
function
executable file
symbol table
offset
assembly instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111166915.6A
Other languages
Chinese (zh)
Other versions
CN113946346A (en
Inventor
樊锐
彭飞
邓竹立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 58 Information Technology Co Ltd
Original Assignee
Beijing 58 Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 58 Information Technology Co Ltd filed Critical Beijing 58 Information Technology Co Ltd
Priority to CN202111166915.6A priority Critical patent/CN113946346B/en
Publication of CN113946346A publication Critical patent/CN113946346A/en
Application granted granted Critical
Publication of CN113946346B publication Critical patent/CN113946346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/51Source to source
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly

Abstract

The invention provides a data processing method, a data processing device, electronic equipment and a storage medium. Whether at least two same or similar functions exist in the executable file can be detected based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file. In addition, the whole detection process can be free from participation of developers, and labor cost can be saved. Secondly, the source code of the application program can be simplified without manually modifying the source code of the application program, so that the automation degree is improved, and the labor cost is reduced.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
In the process of developing the source code of the application program, many functions are often written in the source code of the application program to call functions for some components in the application program, and in addition, a third-party library may be added to the source code of the application program, and a compiled function also exists in the third-party library to call the compiled function in the third-party library for the components in the application program.
However, sometimes, the source code of the application program is developed by cooperation of multiple developers, and different developers develop part of the source code of the application program, which inevitably causes the situation that the same function repeatedly appears in the finally obtained source code of the application program multiple times, and further causes the source code of the application program to be redundant, and thus causes a large occupied space for an executable file of the application program obtained by compiling the source code.
Therefore, in order to reduce the space occupied by the executable file of the application program, the source code of the application program needs to be reduced, and in order to reduce the source code of the application program, a developer can manually detect whether at least two identical or similar functions exist in the source code of the application program according to the function names of the functions.
However, the inventor finds that the above-described method can only detect whether at least two identical or similar functions exist in the source code, and cannot give consideration to a third-party library added to the source code of the application program, which is easy to miss detection, resulting in low detection accuracy.
Disclosure of Invention
In order to reduce labor cost and improve detection accuracy, the application shows a data processing method, a data processing device, an electronic device and a storage medium.
In a first aspect, the present application shows a data processing method, comprising:
acquiring the total number of symbol table items corresponding to each function in an executable file and a first offset of the symbol table item corresponding to a first function in the executable file according to a symbol table in the executable file of an application program;
obtaining a second offset of the function name of the function in the executable file recorded in each symbol table entry in the executable file and a third offset of the assembly instruction of the function in the executable file recorded in each symbol table entry in the executable file according to at least the total number and the first offset;
for any symbol table entry in the executable file, disassembling the second offset of the function name of the function corresponding to the symbol table entry in the executable file to obtain the function name of the function corresponding to the symbol table entry; and acquiring the assembly instruction of the function corresponding to the symbol table item according to a third offset of the assembly instruction of the function corresponding to the symbol table item in an executable file, and then storing the function name of the function corresponding to the symbol table item and the assembly instruction of the function corresponding to the symbol table item in a corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
In an optional implementation manner, the obtaining, according to a symbol table in an executable file of an application program, a total number of symbol table entries respectively corresponding to each function in the executable file includes:
and acquiring a fourth offset of the total number of the symbol table entries in the symbol table, which correspond to each function in the executable file recorded in the symbol table.
And acquiring the total number of symbol table entries which are recorded in the symbol table and respectively correspond to each function in the executable file according to the fourth offset.
In an optional implementation manner, the obtaining a fourth offset of the total number of symbol table entries in the symbol table, where the total number of the symbol table entries respectively corresponds to each function in the executable file recorded in the symbol table includes:
determining the memory structure of the symbol table;
and determining a fourth offset of the total number of symbol table entries in the symbol table, which correspond to each function in the executable file, in the symbol table according to the memory structure of the symbol table.
In a second aspect, the present application shows a data processing method, including:
acquiring a corresponding relation between a function name of a function in an executable file of an application program and an assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in one symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name may be obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
screening at least two assembly instructions with the similarity greater than a preset similarity in the corresponding relation;
searching function names of functions corresponding to at least two assembly instructions in the corresponding relation;
and outputting the function names of the functions corresponding to the at least two assembly instructions respectively.
In a third aspect, the present application shows a data processing method, comprising:
acquiring a corresponding relation between a function name of a function in an executable file of an application program and an assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in one symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name may be obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
screening at least two assembly instructions with the similarity greater than a preset similarity in the corresponding relation;
searching function names of functions corresponding to at least two assembly instructions in the corresponding relation;
acquiring a source code of the application program, and deleting codes of functions corresponding to other function names except one function name in the searched function names in the source code;
and replacing the other function names in the source code with the one function name to obtain the modified source code of the application program.
In an optional implementation, the method further comprises:
and compiling the modified source code of the application program to obtain a new executable file of the application program.
In a fourth aspect, the present application shows a data processing apparatus comprising:
a first obtaining module, configured to obtain, according to a symbol table in an executable file of an application program, a total number of symbol table entries corresponding to each function in the executable file and a first offset of a symbol table entry corresponding to a first function in the executable file;
a second obtaining module, configured to obtain, according to at least the total number and the first offset, a second offset, in the executable file, of a function name of a function in the executable file recorded in each symbol table entry, and a third offset, in the executable file, of an assembler instruction of a function in the executable file recorded in each symbol table entry;
the disassembling module is used for disassembling the second offset of the function name of the function corresponding to the symbol table item in the executable file for any symbol table item in the executable file to obtain the function name of the function corresponding to the symbol table item; a third obtaining module, configured to obtain, according to a third offset of the assembly instruction of the function corresponding to the symbol table entry in the executable file, the assembly instruction of the function corresponding to the symbol table entry; and the storage module is used for storing the function name of the function corresponding to the symbol table entry and the assembly instruction of the function corresponding to the symbol table entry in the corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
In an optional implementation manner, the obtaining module includes:
a first obtaining unit, configured to obtain a fourth offset of the total number of symbol table entries in the symbol table, where the total number of symbol table entries corresponds to each function in the executable file recorded in the symbol table.
And a second obtaining unit, configured to obtain, according to the fourth offset, a total number of symbol table entries that are recorded in the symbol table and respectively correspond to each function in the executable file.
In an optional implementation manner, the first obtaining unit includes:
the first determining subunit is used for determining the memory structure of the symbol table;
a second determining subunit, configured to determine, in the symbol table according to the memory structure of the symbol table, a fourth offset of the total number of symbol table entries in the symbol table, where the total number of symbol table entries corresponds to each function in the executable file, respectively.
In a fifth aspect, the present application shows a data processing apparatus comprising:
the fourth acquisition module is used for acquiring the corresponding relation between the function name of the function in the executable file of the application program and the assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in one symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name may be obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
the first screening module is used for screening at least two assembly instructions with the similarity greater than a preset similarity in the corresponding relation;
the first searching module is used for searching function names of functions corresponding to at least two assembly instructions in the corresponding relation;
and the output module is used for outputting the function names of the functions corresponding to the at least two assembly instructions respectively.
In a sixth aspect, the present application shows a data processing apparatus comprising:
the fifth acquisition module is used for acquiring the corresponding relation between the function name of the function in the executable file of the application program and the assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in one symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name may be obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
the second screening module is used for screening at least two assembly instructions with the similarity greater than the preset similarity in the corresponding relation;
the second searching module is used for searching function names of functions corresponding to the at least two assembly instructions in the corresponding relation;
a sixth obtaining module, configured to obtain a source code of the application program;
the deleting module is used for deleting codes of functions corresponding to other function names except one function name in the searched function names in the source codes;
and the replacing module is used for replacing the other function names in the source code with the one function name to obtain the modified source code of the application program.
In an optional implementation, the apparatus further comprises:
and the compiling module is used for compiling the modified source code of the application program to obtain a new executable file of the application program.
In a seventh aspect, the present application shows an electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the data processing method of the first aspect.
In an eighth aspect, the present application shows a non-transitory computer readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method of the first aspect.
In a ninth aspect, the present application shows a computer program product, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method according to the first aspect.
In a tenth aspect, the present application shows an electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the data processing method according to the second aspect.
In an eleventh aspect, the present application shows a non-transitory computer-readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method of the second aspect.
In a twelfth aspect, the present application shows a computer program product, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method according to the second aspect.
In a thirteenth aspect, the present application shows an electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the data processing method according to the third aspect.
In a fourteenth aspect, the present application shows a non-transitory computer-readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method of the third aspect.
In a fifteenth aspect, the present application shows a computer program product, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method according to the third aspect.
The technical scheme provided by the application can comprise the following beneficial effects:
under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
By the method and the device, whether at least two identical or similar functions exist in the executable file can be detected based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file.
In addition, the whole detection process can be free from participation of developers, and labor cost can be saved.
Secondly, under the condition that whether at least two identical or similar functions exist in the executable file, in the source code of the application program, deleting the codes of the functions corresponding to the function names except for one of the function names in the searched function names, and replacing the other function names in the source code with the function name to obtain the modified source code of the application program. The source code of the application program can be simplified without manually modifying the source code of the application program, so that the automation degree is improved, and the labor cost is reduced.
And if the at least two same or similar functions are located in the third-party library in the executable file, sending a request to a manufacturer of the third-party library so that the manufacturer of the third-party library simplifies the third-party library according to the request, and further reducing the space occupied by the executable file.
Drawings
FIG. 1 is a flow chart of the steps of a data processing method of the present application.
FIG. 2 is a flow chart of the steps of a data processing method of the present application.
FIG. 3 is a flow chart of the steps of a data processing method of the present application.
Fig. 4 is a block diagram of a data processing apparatus according to the present application.
Fig. 5 is a block diagram of a data processing apparatus according to the present application.
Fig. 6 is a block diagram of a data processing apparatus according to the present application.
FIG. 7 is a block diagram of an electronic device of the present application.
FIG. 8 is a block diagram of an electronic device of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart illustrating steps of a data processing method according to the present application is shown, where the method is applied to an electronic device, and the method may specifically include the following steps:
in step S101, according to a symbol table in an executable file of an application program, a total number of symbol entries corresponding to each function in the executable file and a first offset of a symbol entry corresponding to a first function in the executable file are obtained.
The executable file of the application program can be obtained by compiling the source code of the application program.
The source code of the application program comprises several functions, so that the symbol table in the executable file of the application program can often comprise several symbol table entries, and the functions in the source code of the application program correspond to the symbol table entries in the executable file of the application program one to one.
The executable file of the application program comprises a symbol table.
In one example, where the executable file of the application comprises a Mach-O file, the symbol table may then be LC _ SYMTAB or the like. The symbol table entry may comprise a symbol table entry of an nlist _64 memory structure, etc.
The symbol table records attribute information of the executable file, for example, including the size of the space occupied by the symbol table, the number of symbol entries included in the executable file, and a first offset of the symbol entry corresponding to the first function in the executable file.
In this application, when the executable file is started in the electronic device, the electronic device may allocate a memory address to the executable file. For example, the memory address of the executable file may be a memory address from one address to another address in the memory, and the like.
The first offset of the symbol table entry corresponding to the first function in the executable file can be understood as: and the distance between the initial address of the symbol table entry corresponding to the first function in the executable file in the memory of the electronic equipment and the initial address in the memory address of the executable file.
In this application, the position (for example, the offset, etc.) of the symbol table in the executable file is fixed, and the symbol table is stored in a specific position in the executable file, so the electronic device may obtain the symbol table at the specific position in the executable file, and then may obtain the total number of symbol entries respectively corresponding to each function in the executable file and the first offset of the symbol entry corresponding to the first function in the executable file in the symbol table, and then may execute step S102.
The method for acquiring the total number of symbol table entries corresponding to each function in the executable file according to the symbol table in the executable file of the application program includes:
1011. and acquiring a fourth offset of the total number of the symbol table entries in the symbol table, which respectively correspond to each function in the executable file recorded in the symbol table.
This step can be realized by the following process, including:
11) and determining the memory structure of the symbol table.
In the present application, there are often a plurality of contents in the symbol table, and the positions of different contents in the symbol table are fixed, so the memory structure of the symbol table is fixed.
12) And determining a fourth offset of the total number of the symbol table items in the symbol table, which are respectively corresponding to each function in the executable file, in the symbol table according to the memory structure of the symbol table.
In this application, the symbol table has a memory address in the memory, where the memory address is allocated to the symbol table by the electronic device when the memory address is allocated to the executable file, and a fourth offset of the total number of symbol table entries corresponding to each function in the executable file in the symbol table may be understood as: the total number of the symbol table entries corresponding to each function in the executable file is the distance from the initial address in the symbol table to the first address of the memory address in the symbol table.
After the memory structure of the symbol table is determined, which contents are included in the symbol table in order are determined, and the size of the space occupied by each content in the symbol table is determined.
Therefore, according to the content before the total number of the symbol table entries corresponding to each function in the executable file in the symbol table and the size of the space occupied by each content, the fourth offset of the total number of the symbol table entries corresponding to each function in the executable file in the symbol table can be determined.
1012. And acquiring the total number of the symbol table entries which are recorded in the symbol table and respectively correspond to each function in the executable file according to the fourth offset of the total number of the symbol table entries which respectively correspond to each function in the executable file in the symbol table.
Correspondingly, with reference to the above manner, the first offset of the symbol table entry corresponding to the first function in the executable file may also be obtained according to the symbol table in the executable file of the application program.
For example, a fifth offset in the symbol table of "the first offset of the symbol table entry corresponding to the first function in the executable file" in the executable file recorded in the symbol table may be obtained, and a specific obtaining manner may be referred to the flow of 1011, which is not described in detail herein. Then, the first offset of the symbol table entry corresponding to the first function in the executable file recorded in the symbol table in the executable file may be obtained according to the fifth offset of the "first offset of the symbol table entry corresponding to the first function in the executable file" in the symbol table.
In step S102, a second offset of the function name of the function in the executable file recorded in each symbol table entry in the executable file and a third offset of the assembly instruction of the function in the executable file recorded in each symbol table entry in the executable file are obtained at least according to the total number of the symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file.
In an embodiment of the present application, the executable file includes a plurality of symbol entries, the size of the space occupied by each symbol entry is the same, and the size of the space occupied by a symbol entry may be obtained by a specific method, for example, the size of the space occupied by a symbol entry is obtained by a "sizeof" function.
The plurality of symbol table entries in the symbol table are sequentially arranged in the executable file.
Therefore, the size of the space occupied by the symbol table entry corresponding to the first function can be obtained through a specific method, and the first offset of the symbol table entry corresponding to the second function in the executable file can be determined according to the first offset of the symbol table entry corresponding to the first function in the executable file and the size of the space occupied by the symbol table entry corresponding to the first function.
According to the first offset of the symbol table entry corresponding to the second function in the executable file and the space occupied by the symbol table entry corresponding to the second function, the first offset of the symbol table entry corresponding to the third function in the executable file can be determined, and so on, until the first offset of the symbol table entry corresponding to the nth function in the executable file is determined, wherein N is the same as the total number of the symbol table entries corresponding to each function in the executable file.
For any symbol table entry in the executable file, after obtaining the first offset of the symbol table entry in the executable file, the memory address of the symbol table entry may be determined according to the first offset of the symbol table entry in the executable file and the size of the space occupied by the symbol table entry, and data located in the memory address of the symbol table entry may be obtained.
Wherein, the memory address records at least a second offset of the function name of the function corresponding to the symbol table item in the executable file and a third offset of the assembly instruction of the function corresponding to the symbol table item in the executable file.
It should be noted that the second offset is an offset of the function name, and is not an offset of the function implementation, and the function name of the function corresponding to the symbol table entry can be found based on the second offset. The assembly instruction of the function corresponding to the symbol table entry can be found based on the third offset.
For example, in the memory address of the symbol table entry, a second offset of the function name of the function corresponding to the symbol table entry and a third offset of the assembly instruction of the function corresponding to the symbol table entry are recorded in the form of a "key-value pair (e.g., key-value, etc.).
In an example, the second offset of the function name of the function corresponding to the symbol table entry is value, and the key corresponding to the second offset of the function name of the function corresponding to the symbol table entry may be a preset first specific character string.
The third offset of the assembly instruction of the function corresponding to the symbol table entry is value, and the key of the third offset of the assembly instruction of the function corresponding to the symbol table entry may be a preset second specific character string.
In this way, in the data located in the memory address of the symbol table entry, the second offset of the function name of the function corresponding to the symbol table entry in the executable file may be obtained according to the first specific character string, and the third offset of the assembly instruction of the function corresponding to the symbol table entry in the executable file may be obtained according to the second specific character string.
The same is true for every other symbol entry in the executable file.
In step S103, for any symbol table entry in the executable file, disassembling the second offset of the function name of the function corresponding to the symbol table entry in the executable file, and obtaining the function name of the function corresponding to the symbol table entry; and acquiring the assembly instruction of the function corresponding to the symbol table item according to the third offset of the assembly instruction of the function corresponding to the symbol table item in the executable file, and storing the function name of the function corresponding to the symbol table item and the assembly instruction of the function corresponding to the symbol table item in the corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
The same is true for every other symbol entry in the executable file.
In an embodiment, the disassembling manner of disassembling the function name of the function corresponding to the symbol table entry in the second offset of the executable file may be referred to an already existing manner, and is not described in detail herein.
Under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
In the present application, according to a symbol table in an executable file of an application program, a total number of symbol table entries corresponding to each function in the executable file and a first offset of a symbol table entry corresponding to a first function in the executable file are obtained. And acquiring a second offset of the function name of the function in the executable file recorded in each symbol table entry in the executable file and a third offset of the assembly instruction of the function in the executable file recorded in each symbol table entry in the executable file according to at least the total amount and the first offset. For any symbol table entry in the executable file, disassembling the second offset of the function name of the function corresponding to the symbol table entry in the executable file to obtain the function name of the function corresponding to the symbol table entry; and acquiring the assembly instruction of the function corresponding to the symbol table item according to the third offset of the assembly instruction of the function corresponding to the symbol table item in the executable file, and storing the function name of the function corresponding to the symbol table item and the assembly instruction of the function corresponding to the symbol table item in the corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
By the method and the device, the assembly instruction of the function in the source code written by the developer corresponding to the executable file can be obtained, the assembly instruction of the function in the third-party library in the executable file can be obtained, and the assembly instruction of the function can embody the realization principle of the function, so that, in the case of obtaining the assembler instruction of the function in the source code written by the developer corresponding to the executable file and the assembler instruction of the function in the third-party library in the executable file, whether at least two same or similar functions exist in the executable file can be detected through the assembler instructions, the detection mode not only can give consideration to source codes written by developers corresponding to the executable file, but also can give consideration to assembler instructions of the functions in a third-party library in the executable file, and omission is avoided, so that the detection comprehensiveness can be improved, and the detection accuracy can be further improved.
In one embodiment, after obtaining the correspondence between the function name of the function in the executable file and the assembler of the function in the executable file, it may detect whether there are at least two same or similar functions in the executable file according to the correspondence between the function name of the function in the executable file and the assembler of the function in the executable file, and in case there are at least two same or similar functions in the executable file, may output the function names of the at least two same or similar functions for a developer to view the function names of the at least two same or similar functions, and then the developer may modify the source code of the application according to the function names of the at least two same or similar functions, for example, leave one of the at least two same or similar functions in the source code, so as to achieve the purpose of simplifying the source code of the application, and furthermore, the occupied space of a new executable file of the application program generated according to the simplified source code can be reduced.
Specifically, referring to fig. 2, a flowchart illustrating steps of a data processing method according to the present application is shown, where the method is applied to an electronic device, and the method may specifically include the following steps:
in step S201, a correspondence between a function name of a function in an executable file of an application program and an assembler instruction of the function in the executable file is acquired.
For any function name in the corresponding relationship between the function name of the function in the executable file and the assembly instruction of the function in the executable file, the function name may be obtained by disassembling "the second offset of the function name in the executable file recorded in one symbol table entry in the executable file", and the assembly instruction of the function corresponding to the function name may be obtained according to the third offset of the assembly instruction in the executable file recorded in the symbol table entry. The second offset and the third offset are respectively obtained at least according to the total number of the symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file.
The same is true for the function name of each of the other functions in the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file.
In an embodiment of the present application, a correspondence between a function name of a function in an executable file and an assembly instruction of the function in the executable file may be generated by an electronic device in real time according to the executable file of an application program, and a specific generation manner may refer to the embodiment shown in fig. 1, which is not described in detail herein.
In another embodiment of the present application, the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file may be generated in advance according to the embodiment shown in fig. 1.
For example, the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file may be generated in advance by the electronic device itself and stored in the electronic device, and the specific generation manner may be as shown in the embodiment shown in fig. 1 and will not be described in detail here. In this manner, in this step, the electronic device can acquire the correspondence between the function name of the function in the executable file stored locally and the assembler instruction of the function in the executable file.
For another example, the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file may be generated in advance by other devices and stored in the cloud, and the specific generation manner may refer to the embodiment shown in fig. 1, which is not described in detail herein. In this way, in this step, the electronic device downloads, from the cloud, the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
In step S202, at least two assembly instructions with similarity greater than a preset similarity are screened from the correspondence between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file.
In the present application, when calculating the similarity between any two assembler instructions in the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file, reference may be made to a currently existing manner, and the present application does not limit a specific calculation manner.
The preset similarity may be determined according to actual conditions, and the preset similarity is not limited in the present application. For example, the preset similarity may be determined according to a selected calculation method for calculating the similarity.
In step S203, function names of functions respectively corresponding to at least two assembly instructions are searched for in a correspondence between function names of functions in the executable file and assembly instructions of the functions in the executable file.
For any one of the at least two assembly instructions, the function name corresponding to the assembly instruction may be looked up in the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file, and the same is true for each of the other at least two assembly instructions.
In step S204, function names of functions corresponding to at least two assembler instructions are output.
In an embodiment of the present application, a prompt message may be displayed on a screen of the electronic device, where the prompt message includes function names of functions corresponding to at least two assembly instructions, and the prompt message is used to indicate that a similarity between functions corresponding to at least two function names in an application program is greater than a preset similarity, that is, it may prompt a user that the functions corresponding to at least two function names may be the same function or similar (e.g., similar in function).
Or, in an embodiment of the present application, a prompt message may be played using a sound of the electronic device, where the prompt message includes function names of functions corresponding to at least two assembly instructions, and the prompt message is used to indicate that a similarity between functions corresponding to at least two function names in an application program is greater than a preset similarity, that is, it may prompt a user that the functions corresponding to at least two function names may be the same function or similar (e.g., similar functions).
After the developer perceives the prompt information, it may be known that the similarity between the functions corresponding to the at least two function names in the application program is greater than the preset similarity, that is, the user may know that the functions corresponding to the at least two function names may be the same function or similar (for example, similar functions), and the developer may modify the source code of the application program, for example, one of the at least two same or similar functions is retained in the source code, so as to achieve the purpose of simplifying the source code of the application program, and further, may reduce a space occupied by an executable file of the application program generated according to the simplified source code.
Or, in an embodiment of the present application, function names of functions corresponding to at least two assembly instructions may also be recorded in a preset file in the electronic device, so that a developer may view the function names of the functions corresponding to the at least two assembly instructions in the preset file as needed.
Or, in an embodiment of the present application, the function names of the functions corresponding to the at least two assembly instructions may also be sent to a related developer in any available manner such as an email, a short message, or a PUSH message, so that the developer can view the function names of the functions corresponding to the at least two assembly instructions, respectively.
Under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
In the present application, a correspondence relationship between a function name of a function in an executable file of an application program and an assembler instruction of the function in the executable file is obtained. And screening at least two assembly instructions with the similarity greater than the preset similarity in the corresponding relation between the function name of the function in the executable file and the assembly instructions of the function in the executable file. And searching function names of functions corresponding to at least two assembly instructions in the corresponding relation between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file. And outputting the function names of the functions corresponding to the at least two assembly instructions respectively.
By the method and the device, whether at least two identical or similar functions exist in the executable file can be detected based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file.
In addition, the whole detection process can be free from participation of developers, and labor cost can be saved.
In one embodiment, after obtaining the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file, it is possible to detect whether at least two identical or similar functions exist in the executable file based on the correspondence between the function names of the functions in the executable file and the assembler instructions of the functions in the executable file, and in the case where at least two identical or similar functions exist in the executable file, source code modifications to an application program based on the function name of at least two identical or similar functions, e.g., one of at least two same or similar functions is reserved in the source code to achieve the purpose of simplifying the source code of the application program, and furthermore, the occupied space of a new executable file of the application program generated according to the simplified source code can be reduced.
Specifically, referring to fig. 3, a flowchart illustrating steps of a data processing method according to the present application, which is applied to an electronic device, may specifically include the following steps:
in step S301, a correspondence between a function name of a function in an executable file of an application program and an assembler instruction of the function in the executable file is acquired.
For any function name in the corresponding relationship between the function name of the function in the executable file and the assembly instruction of the function in the executable file, the function name may be obtained by disassembling "the second offset of the function name in the executable file recorded in one symbol table entry in the executable file", and the assembly instruction of the function corresponding to the function name may be obtained according to the third offset of the assembly instruction in the executable file recorded in the symbol table entry. The second offset and the third offset are respectively obtained at least according to the total number of the symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file.
The same is true for the function name of each of the other functions in the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file.
In an embodiment of the present application, a correspondence between a function name of a function in an executable file and an assembly instruction of the function in the executable file may be generated by an electronic device in real time according to the executable file of an application program, and a specific generation manner may refer to the embodiment shown in fig. 1, which is not described in detail herein.
In another embodiment of the present application, the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file may be generated in advance according to the embodiment shown in fig. 1.
For example, the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file may be generated in advance by the electronic device itself and stored in the electronic device, and the specific generation manner may be as shown in the embodiment shown in fig. 1 and will not be described in detail here. In this manner, in this step, the electronic device can acquire the correspondence between the function name of the function in the executable file stored locally and the assembler instruction of the function in the executable file.
For another example, the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file may be generated in advance by other devices and stored in the cloud, and the specific generation manner may refer to the embodiment shown in fig. 1, which is not described in detail herein. In this way, in this step, the electronic device downloads, from the cloud, the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
In step S302, at least two assembly instructions with similarity greater than a preset similarity are screened from the correspondence between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file.
In the present application, when calculating the similarity between any two assembler instructions in the correspondence between the function name of the function in the executable file and the assembler instruction of the function in the executable file, reference may be made to a currently existing manner, and the present application does not limit a specific calculation manner.
The preset similarity may be determined according to actual conditions, and the preset similarity is not limited in the present application. For example, the preset similarity may be determined according to a selected calculation method for calculating the similarity.
In step S303, function names of functions respectively corresponding to at least two assembly instructions are searched for in a correspondence relationship between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file.
For any one of the at least two assembly instructions, the function name corresponding to the assembly instruction may be looked up in the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file, and the same is true for each of the other at least two assembly instructions.
In step S304, a source code of the application is acquired, and in the source code of the application, a code of a function corresponding to a function name other than one of the found function names is deleted.
In the application, because the similarity between the assembly instructions of the at least two functions is greater than the preset similarity, it is often indicated that the functions of the assembly instructions of the at least two functions are almost the same, in this case, the source code of the application program does not need to have the at least two functions at the same time, and only one function may be reserved in the source code of the application program, so that the source code of the application program can be simplified.
The one function name may be any one of the function names of the functions corresponding to the at least two found assembly instructions, or may be a function name specified by a developer in the function names of the functions corresponding to the at least two found assembly instructions, or the like.
The codes of the functions corresponding to the other function names can be searched in the source code of the application program, and the searched codes can be deleted in the source code of the application program.
In step S305, the other function names in the source code are replaced with the one function name, so as to obtain the modified source code of the application program.
However, after the code of the function corresponding to the other function name is deleted from the source code of the application program, the function corresponding to the other function name does not exist in the source code of the application program, so that even if the other function name exists in the components, the function corresponding to the other function name cannot be called through the other function name, and thus the functions corresponding to the other function name cannot be used by the components.
Therefore, in order to enable the components to use the functions corresponding to the other function names, after the codes of the functions corresponding to the other function names are deleted from the source code of the application program, the other function names in the source code can be replaced by the one function name, so that the modified source code of the application program is obtained, and thus the other function names in the source code can be replaced by the one function name, so that the components can call the functions corresponding to the one function name in the application program through the one function name, and thus the components can use the functions corresponding to the one function name.
Since the degree of similarity between the assembly instruction of the function corresponding to the one function name and the assembly instruction of the function corresponding to the other function name is very large or even the same, the degree of similarity between the function corresponding to the one function name and the function corresponding to the other function name is very large or even the same, so that the purpose of the component realized by using the function corresponding to the one function name is almost the same as the purpose of the component realized by using the function corresponding to the other function name, and therefore, the function of the component realized by using the function corresponding to the one function name is equivalent to the function of the component corresponding to the other function name.
Under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
In the present application, the correspondence between the function name of the function in the executable file of the application program and the assembler instruction of the function in the executable file is obtained. And screening at least two assembly instructions with the similarity greater than the preset similarity in the corresponding relation between the function name of the function in the executable file and the assembly instructions of the function in the executable file. And searching function names of functions corresponding to at least two assembly instructions in the corresponding relation between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file. And acquiring a source code of the application program, and deleting codes of functions corresponding to other function names except one function name in the searched function names in the source code of the application program. And replacing other function names in the source code with the function name to obtain the modified source code of the application program.
By the method and the device, whether at least two identical or similar functions exist in the executable file can be detected based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file.
In addition, the whole detection process can be free from participation of developers, and labor cost can be saved.
Secondly, under the condition that whether at least two identical or similar functions exist in the executable file, in the source code of the application program, deleting the codes of the functions corresponding to the function names except for one of the function names in the searched function names, and replacing the other function names in the source code with the function name to obtain the modified source code of the application program. The source code of the application program can be simplified without manually modifying the source code of the application program, so that the automation degree is improved, and the labor cost is reduced.
And if the at least two same or similar functions are located in the third-party library in the executable file, sending a request to a manufacturer of the third-party library so that the manufacturer of the third-party library simplifies the third-party library according to the request, and further reducing the space occupied by the executable file.
Further, after obtaining the modified source code of the application program, the modified source code of the application program may be automatically compiled to obtain a new executable file of the application program.
Therefore, the modified source code of the application program does not need to be compiled manually, the automation degree is improved, and the labor cost is reduced.
It should be noted that for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will appreciate that the embodiments described in this specification are exemplary embodiments in nature, and that no attempt is made to show structural details of the embodiments in order to practice the invention.
Referring to fig. 4, a block diagram of a data processing apparatus according to the present application is shown, and the apparatus may specifically include the following modules:
a first obtaining module 11, configured to obtain, according to a symbol table in an executable file of an application program, a total number of symbol table entries respectively corresponding to each function in the executable file and a first offset of a symbol table entry corresponding to a first function in the executable file;
a second obtaining module 12, configured to obtain, according to at least the total number and the first offset, a second offset in the executable file of a function name of a function in the executable file recorded in each symbol table entry and a third offset in the executable file of an assembler instruction of a function in the executable file recorded in each symbol table entry;
a disassembling module 13, configured to disassemble, for any symbol table entry in the executable file, a second offset of the function name of the function corresponding to the symbol table entry in the executable file, so as to obtain the function name of the function corresponding to the symbol table entry; a third obtaining module 14, configured to obtain, according to a third offset of the assembly instruction of the function corresponding to the symbol table entry in the executable file, the assembly instruction of the function corresponding to the symbol table entry; and the storage module 15 is configured to store, in the correspondence between the function name of the function in the executable file and the assembly instruction of the function in the executable file, the function name of the function corresponding to the symbol table entry and the assembly instruction of the function corresponding to the symbol table entry.
In an optional implementation manner, the obtaining module includes:
a first obtaining unit, configured to obtain a fourth offset of the total number of symbol table entries in the symbol table, where the total number of symbol table entries corresponds to each function in the executable file recorded in the symbol table.
And a second obtaining unit, configured to obtain, according to the fourth offset, a total number of symbol table entries that are recorded in the symbol table and respectively correspond to each function in the executable file.
In an optional implementation manner, the first obtaining unit includes:
the first determining subunit is used for determining the memory structure of the symbol table;
a second determining subunit, configured to determine, in the symbol table according to the memory structure of the symbol table, a fourth offset of the total number of symbol table entries in the symbol table, where the total number of symbol table entries corresponds to each function in the executable file, respectively.
Under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
In the present application, according to a symbol table in an executable file of an application program, a total number of symbol table entries corresponding to each function in the executable file and a first offset of a symbol table entry corresponding to a first function in the executable file are obtained. And acquiring a second offset of the function name of the function in the executable file recorded in each symbol table entry in the executable file and a third offset of the assembly instruction of the function in the executable file recorded in each symbol table entry in the executable file according to at least the total amount and the first offset. For any symbol table entry in the executable file, disassembling the second offset of the function name of the function corresponding to the symbol table entry in the executable file to obtain the function name of the function corresponding to the symbol table entry; and acquiring the assembly instruction of the function corresponding to the symbol table item according to the third offset of the assembly instruction of the function corresponding to the symbol table item in the executable file, and storing the function name of the function corresponding to the symbol table item and the assembly instruction of the function corresponding to the symbol table item in the corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file.
By the method and the device, the assembly instruction of the function in the source code written by the developer corresponding to the executable file can be obtained, the assembly instruction of the function in the third-party library in the executable file can be obtained, and the assembly instruction of the function can embody the realization principle of the function, so that, in the case of obtaining the assembler instruction of the function in the source code written by the developer corresponding to the executable file and the assembler instruction of the function in the third-party library in the executable file, whether at least two same or similar functions exist in the executable file can be detected through the assembler instructions, the detection mode not only can give consideration to source codes written by developers corresponding to the executable file, but also can give consideration to assembler instructions of the functions in a third-party library in the executable file, and omission is avoided, so that the detection comprehensiveness can be improved, and the detection accuracy can be further improved.
Referring to fig. 5, a block diagram of a data processing apparatus according to the present application is shown, and the apparatus may specifically include the following modules:
a fourth obtaining module 21, configured to obtain a correspondence between a function name of a function in an executable file of the application program and an assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in one symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name may be obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
the first screening module 22 is configured to screen at least two assembly instructions with similarity greater than a preset similarity in the corresponding relationship;
the first searching module 23 is configured to search function names of functions corresponding to at least two assembly instructions in the corresponding relationship;
and the output module 24 is configured to output function names of functions corresponding to the at least two assembly instructions respectively.
Under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
In the present application, a correspondence relationship between a function name of a function in an executable file of an application program and an assembler instruction of the function in the executable file is obtained. And screening at least two assembly instructions with the similarity greater than the preset similarity in the corresponding relation between the function name of the function in the executable file and the assembly instructions of the function in the executable file. And searching function names of functions corresponding to at least two assembly instructions in the corresponding relation between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file. And outputting the function names of the functions corresponding to the at least two assembly instructions respectively.
By the method and the device, whether at least two identical or similar functions exist in the executable file can be detected based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file.
In addition, the whole detection process can be free from participation of developers, and labor cost can be saved.
Referring to fig. 6, a block diagram of a data processing apparatus according to the present application is shown, and the apparatus may specifically include the following modules:
a fifth obtaining module 31, configured to obtain a correspondence between a function name of a function in an executable file of the application program and an assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in one symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name may be obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
the second screening module 32 is configured to screen at least two assembly instructions with similarity greater than a preset similarity in the corresponding relationship;
a second searching module 33, configured to search, in the correspondence, function names of functions corresponding to at least two assembly instructions respectively;
a sixth obtaining module 34, configured to obtain a source code of the application program;
a deleting module 35, configured to delete, in the source code, a code of a function corresponding to a function name other than one of the found function names;
a replacing module 36, configured to replace the other function names in the source code with the one function name, so as to obtain a modified source code of the application program.
In an optional implementation, the apparatus further comprises:
and the compiling module is used for compiling the modified source code of the application program to obtain a new executable file of the application program.
Under the condition that the executable file comprises the third party library, the third party library is obtained by compiling the source code corresponding to the third party library, and a developer can only obtain the third party library and cannot obtain the source code corresponding to the third party library, but the developer cannot manually identify the meaning of the content in the compiled third party library, so that the developer can only detect whether at least two same or similar functions exist in the source code compiled by the developer corresponding to the executable file, the third party library cannot be considered during detection, and the detection accuracy is low due to the fact that the third party library is easy to miss detection.
In the present application, a correspondence relationship between a function name of a function in an executable file of an application program and an assembler instruction of the function in the executable file is obtained. And screening at least two assembly instructions with the similarity greater than the preset similarity in the corresponding relation between the function name of the function in the executable file and the assembly instructions of the function in the executable file. And searching function names of functions corresponding to at least two assembly instructions in the corresponding relation between the function names of the functions in the executable file and the assembly instructions of the functions in the executable file. And acquiring a source code of the application program, and deleting codes of functions corresponding to other function names except one function name in the searched function names in the source code of the application program. And replacing other function names in the source code with the function name to obtain the modified source code of the application program.
By the method and the device, whether at least two identical or similar functions exist in the executable file can be detected based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file.
In addition, the whole detection process can be free from participation of developers, and labor cost can be saved.
Secondly, under the condition that whether at least two identical or similar functions exist in the executable file, in the source code of the application program, deleting the codes of the functions corresponding to the function names except for one of the function names in the searched function names, and replacing the other function names in the source code with the function name to obtain the modified source code of the application program. The source code of the application program can be simplified without manually modifying the source code of the application program, so that the automation degree is improved, and the labor cost is reduced.
And if the at least two same or similar functions are located in the third-party library in the executable file, sending a request to a manufacturer of the third-party library so that the manufacturer of the third-party library simplifies the third-party library according to the request, and further reducing the space occupied by the executable file.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Optionally, an embodiment of the present invention further provides an electronic device, including: the processor, the memory, and the computer program stored in the memory and capable of running on the processor, when executed by the processor, implement the processes of the data processing method embodiments described above, and can achieve the same technical effects, and in order to avoid repetition, details are not described here.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the data processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
Fig. 7 is a block diagram of an electronic device 800 shown in the present application. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 7, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, images, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast operation information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 8 is a block diagram of an electronic device 1900 shown in the present application. For example, the electronic device 1900 may be provided as a server.
Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (14)

1. A method of data processing, the method comprising:
acquiring the total number of symbol table entries corresponding to each function in an executable file and a first offset of the symbol table entry corresponding to a first function in the executable file according to a symbol table in the executable file of an application program;
obtaining a second offset of the function name of the function in the executable file recorded in each symbol table entry in the executable file and a third offset of the assembly instruction of the function in the executable file recorded in each symbol table entry in the executable file according to at least the total number and the first offset;
for any symbol table entry in the executable file, disassembling the second offset of the function name of the function corresponding to the symbol table entry in the executable file to obtain the function name of the function corresponding to the symbol table entry; acquiring an assembly instruction of a function corresponding to the symbol table item according to a third offset of the assembly instruction of the function corresponding to the symbol table item in an executable file, and then storing the function name of the function corresponding to the symbol table item and the assembly instruction of the function corresponding to the symbol table item in a corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file;
and detecting whether at least two identical or similar functions exist in the executable file based on the assembly instruction of the function in the source code written by the developer corresponding to the executable file and the assembly instruction of the function in the third-party library in the executable file.
2. The method according to claim 1, wherein the obtaining, according to a symbol table in an executable file of an application program, a total number of symbol table entries respectively corresponding to each function in the executable file comprises:
acquiring a fourth offset of the total number of symbol table entries in the symbol table, wherein the total number of the symbol table entries corresponds to each function in the executable file recorded in the symbol table;
and acquiring the total number of symbol table entries which are recorded in the symbol table and respectively correspond to each function in the executable file according to the fourth offset.
3. The method according to claim 2, wherein the obtaining a fourth offset of the total number of symbol table entries in the symbol table corresponding to each function in the executable file recorded in the symbol table includes:
determining the memory structure of the symbol table;
and determining a fourth offset of the total number of symbol table entries in the symbol table, which correspond to each function in the executable file, in the symbol table according to the memory structure of the symbol table.
4. The method of claim 3, further comprising:
and compiling the modified source code of the application program to obtain a new executable file of the application program.
5. A method of data processing, the method comprising:
acquiring a corresponding relation between a function name of a function in an executable file of an application program and an assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in a symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name is obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
detecting whether at least two same or similar functions exist in the executable file based on an assembly instruction of a function in a source code written by a developer corresponding to the executable file and an assembly instruction of a function in a third-party library in the executable file;
screening at least two assembly instructions with the similarity greater than a preset similarity in the corresponding relation;
searching function names of functions corresponding to at least two assembly instructions in the corresponding relation;
and outputting the function names of the functions corresponding to the at least two assembly instructions respectively.
6. A method of data processing, the method comprising:
acquiring a corresponding relation between a function name of a function in an executable file of an application program and an assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in a symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name is obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
detecting whether at least two identical or similar functions exist in the executable file based on an assembly instruction of a function in a source code written by a developer corresponding to the executable file and an assembly instruction of a function in a third-party library in the executable file;
screening at least two assembly instructions with the similarity greater than a preset similarity in the corresponding relation;
searching function names of functions corresponding to at least two assembly instructions in the corresponding relation;
acquiring a source code of the application program, and deleting codes of functions corresponding to other function names except one function name in the searched function names in the source code;
and replacing the other function names in the source code with the one function name to obtain the modified source code of the application program.
7. A data processing apparatus, characterized in that the apparatus comprises:
a first obtaining module, configured to obtain, according to a symbol table in an executable file of an application program, a total number of symbol table entries corresponding to each function in the executable file and a first offset of a symbol table entry corresponding to a first function in the executable file;
a second obtaining module, configured to obtain, according to at least the total number and the first offset, a second offset, in the executable file, of a function name of a function in the executable file recorded in each symbol table entry, and a third offset, in the executable file, of an assembler instruction of a function in the executable file recorded in each symbol table entry;
the disassembling module is used for disassembling the second offset of the function name of the function corresponding to the symbol table item in the executable file for any symbol table item in the executable file to obtain the function name of the function corresponding to the symbol table item; a third obtaining module, configured to obtain, according to a third offset of the assembly instruction of the function corresponding to the symbol table entry in the executable file, the assembly instruction of the function corresponding to the symbol table entry; the storage module is used for storing the function name of the function corresponding to the symbol table entry and the assembly instruction of the function corresponding to the symbol table entry in the corresponding relation between the function name of the function in the executable file and the assembly instruction of the function in the executable file;
and the detection module is used for detecting whether at least two same or similar functions exist in the executable file based on the assembly instruction of the functions in the source code written by the developer corresponding to the executable file and the assembly instruction of the functions in the third-party library in the executable file.
8. The apparatus of claim 7, wherein the first obtaining module comprises:
a first obtaining unit, configured to obtain a fourth offset, in the symbol table, of the total number of symbol table entries corresponding to each function in the executable file recorded in the symbol table;
and a second obtaining unit, configured to obtain, according to the fourth offset, a total number of symbol table entries that are recorded in the symbol table and respectively correspond to each function in the executable file.
9. The apparatus of claim 8, wherein the first obtaining unit comprises:
the first determining subunit is used for determining the memory structure of the symbol table;
a second determining subunit, configured to determine, in the symbol table according to the memory structure of the symbol table, a fourth offset of the total number of symbol table entries in the symbol table, where the total number of symbol table entries corresponds to each function in the executable file, respectively.
10. A data processing apparatus, characterized in that the apparatus comprises:
the fourth acquisition module is used for acquiring the corresponding relation between the function name of the function in the executable file of the application program and the assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in a symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name is obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
the detection module is used for detecting whether at least two same or similar functions exist in the executable file based on the assembly instruction of the functions in the source code written by the developer corresponding to the executable file and the assembly instruction of the functions in the third-party library in the executable file;
the first screening module is used for screening at least two assembly instructions with the similarity greater than a preset similarity in the corresponding relation;
the first searching module is used for searching function names of functions corresponding to at least two assembly instructions in the corresponding relation;
and the output module is used for outputting the function names of the functions corresponding to the at least two assembly instructions respectively.
11. A data processing apparatus, characterized in that the apparatus comprises:
the fifth acquisition module is used for acquiring the corresponding relation between the function name of the function in the executable file of the application program and the assembly instruction of the function in the executable file;
for any function name in the corresponding relationship, the function name is obtained by disassembling a second offset of the function name recorded in a symbol table entry in the executable file, an assembly instruction of a function corresponding to the function name is obtained according to a third offset of the assembly instruction recorded in the symbol table entry in the executable file, and the second offset and the third offset are obtained at least according to the total number of symbol table entries respectively corresponding to each function in the executable file and the first offset of the symbol table entry corresponding to the first function in the executable file respectively;
the detection module is used for detecting whether at least two same or similar functions exist in the executable file based on the assembly instruction of the functions in the source code written by the developer corresponding to the executable file and the assembly instruction of the functions in the third-party library in the executable file;
the second screening module is used for screening at least two assembly instructions with the similarity greater than the preset similarity in the corresponding relation;
the second searching module is used for searching function names of functions corresponding to the at least two assembly instructions in the corresponding relation;
a sixth obtaining module, configured to obtain a source code of the application program;
the deleting module is used for deleting codes of functions corresponding to other function names except one function name in the searched function names in the source codes;
and the replacing module is used for replacing the other function names in the source code with the one function name to obtain the modified source code of the application program.
12. The apparatus of claim 11, further comprising:
and the compiling module is used for compiling the modified source code of the application program to obtain a new executable file of the application program.
13. An electronic device, comprising: processor, memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the data processing method according to any one of claims 1 to 6.
14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 6.
CN202111166915.6A 2021-09-30 2021-09-30 Data processing method and device, electronic equipment and storage medium Active CN113946346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111166915.6A CN113946346B (en) 2021-09-30 2021-09-30 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111166915.6A CN113946346B (en) 2021-09-30 2021-09-30 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113946346A CN113946346A (en) 2022-01-18
CN113946346B true CN113946346B (en) 2022-08-09

Family

ID=79329952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111166915.6A Active CN113946346B (en) 2021-09-30 2021-09-30 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113946346B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016082240A1 (en) * 2014-11-25 2016-06-02 武汉安天信息技术有限责任公司 Method and device for detecting malicious code in elf file

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7370318B1 (en) * 2004-09-02 2008-05-06 Borland Software Corporation System and methodology for asynchronous code refactoring with symbol injection
GB2505564B (en) * 2013-08-02 2015-01-28 Somnium Technologies Ltd Software development tool
CN105528365A (en) * 2014-09-30 2016-04-27 国际商业机器公司 Method and device for managing executable files
US9772925B2 (en) * 2015-10-22 2017-09-26 Microsoft Technology Licensing, Llc Storage access debugging with disassembly and symbol entries
CN105868108B (en) * 2016-03-28 2018-09-07 中国科学院信息工程研究所 The unrelated binary code similarity detection method of instruction set based on neural network
CN110879709A (en) * 2019-11-29 2020-03-13 五八有限公司 Detection method and device of useless codes, terminal equipment and storage medium
CN111881455B (en) * 2020-07-27 2023-12-01 绿盟科技集团股份有限公司 Firmware security analysis method and device
CN112596739B (en) * 2020-12-17 2022-03-04 北京五八信息技术有限公司 Data processing method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016082240A1 (en) * 2014-11-25 2016-06-02 武汉安天信息技术有限责任公司 Method and device for detecting malicious code in elf file

Also Published As

Publication number Publication date
CN113946346A (en) 2022-01-18

Similar Documents

Publication Publication Date Title
KR101851082B1 (en) Method and device for information push
EP3282371A1 (en) Data clearing method and apparatus, computer program and recording medium
US20160314164A1 (en) Methods and devices for sharing cloud-based business card
KR101753800B1 (en) Method, and device for displaying task
CN113946346B (en) Data processing method and device, electronic equipment and storage medium
CN106354595B (en) Mobile terminal, hardware component state detection method and device
CN111290882B (en) Data file backup method, data file backup device and electronic equipment
CN113590091A (en) Data processing method and device, electronic equipment and storage medium
CN110457084B (en) Loading method and device
CN108427568B (en) User interface updating method and device
CN107257384B (en) Service state monitoring method and device
CN113946228A (en) Statement recommendation method and device, electronic equipment and readable storage medium
CN107526683B (en) Method and device for detecting functional redundancy of application program and storage medium
CN113934452B (en) Data processing method and device, electronic equipment and storage medium
CN114020505B (en) Data processing method and device, electronic equipment and storage medium
CN114020504B (en) Data processing method and device, electronic equipment and storage medium
CN110673850A (en) Method and device for obtaining size of static library
CN114489641B (en) Data processing method and device, electronic equipment and storage medium
CN110659081B (en) File processing method and device for program object and electronic equipment
CN114416085A (en) Data processing method and device, electronic equipment and storage medium
CN111767249B (en) Method and device for determining self-running time of function
CN110119471B (en) Method and device for checking consistency of search results
CN110347394B (en) Software code analysis method and device
CN107463414B (en) Application installation method and device
CN114416033A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant