CN110879709A - Detection method and device of useless codes, terminal equipment and storage medium - Google Patents

Detection method and device of useless codes, terminal equipment and storage medium Download PDF

Info

Publication number
CN110879709A
CN110879709A CN201911205950.7A CN201911205950A CN110879709A CN 110879709 A CN110879709 A CN 110879709A CN 201911205950 A CN201911205950 A CN 201911205950A CN 110879709 A CN110879709 A CN 110879709A
Authority
CN
China
Prior art keywords
class
symbol
list
instruction
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911205950.7A
Other languages
Chinese (zh)
Inventor
邓竹立
彭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuba Co Ltd
Original Assignee
Wuba Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuba Co Ltd filed Critical Wuba Co Ltd
Priority to CN201911205950.7A priority Critical patent/CN110879709A/en
Publication of CN110879709A publication Critical patent/CN110879709A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention provides a detection method and device of a useless code, terminal equipment and a storage medium. The method comprises the following steps: acquiring an executable program file of an application program to be detected; acquiring a starting class list, a reference class list and a whole class list in the executable program file; acquiring a reference class set of the application program according to the starting class list and the reference class list; and acquiring the useless codes of the application program according to the reference class set and the all class list. The technical problems that an existing useless code detection mode is inconvenient to operate, low in adaptability, easy to carry out error statistics and low in accuracy are solved. The method has the advantages of improving the convenience of detecting the useless codes and the accuracy of the detection result.

Description

Detection method and device of useless codes, terminal equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting a garbage code, a terminal device, and a storage medium.
Background
Currently, various applications inevitably have some useless codes, useless classes, useless functions and the like. The useless classes refer to unused classes, and the useless functions refer to functions which are not called. Moreover, due to the dynamic property, the compiler compiles all source files, and the useless codes not only occupy the system space, but also cause burden to the compilation of the codes. Therefore, it is necessary to accurately find out the useless codes from the source codes of the application programs to avoid the above problems caused by the useless codes.
The existing detection mode of the useless codes generally scans all code files through text analysis and is called by matching and searching related classes; alternatively, the calling condition of the class is acquired at the time of compiling through a plug-in such as Clang. However, with the first method, when the number of files is large, it is difficult to match the files, the processing of the spaces and line feeds is troublesome, and the situation that the code is annotated cannot be recognized, so that the statistics is easy to be mistaken, and the accuracy is low; for the second mode, because the Xcode compiler does not have high support for the Clang plug-in, the plug-in cannot be directly mounted during Xcode compilation at present, and only Clang can be specified for command line compilation, so that the use is inconvenient and the accuracy is not high.
Disclosure of Invention
The embodiment of the invention provides a method and a device for detecting a useless code, a terminal device and a storage medium, which aim to solve the problems that the existing useless code mode is inconvenient to use and has low accuracy.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for detecting a garbage code, including:
acquiring an executable program file of an application program to be detected;
acquiring a starting class list, a reference class list and a whole class list in the executable program file;
acquiring a reference class set of the application program according to the starting class list and the reference class list;
and acquiring the useless codes of the application program according to the reference class set and the all class list.
Optionally, the step of obtaining the reference class set of the application program according to the startup class list and the reference class list includes:
disassembling code segments of the executable program file to obtain a symbolized instruction set;
obtaining a symbol table of the executable program file;
checking each class in the reference class list according to the instruction set and the symbol table;
and acquiring the reference class set of the application program according to the verified reference class list and the starting class list.
Optionally, the step of checking each class in the list of referenced classes according to the instruction set and the symbol table includes:
for each class in the reference class list, traversing each symbol in the symbol table, and if the type of the symbol is a function type, acquiring a starting instruction subscript of the symbol in the instruction set;
searching the addresses of the classes in the instruction set from the instruction corresponding to the starting instruction subscript until finishing the instruction;
if the address of the class is found in the instruction set, confirming that the class is for a useful class.
Optionally, the step of obtaining a starting instruction subscript of the symbol in the instruction set includes:
acquiring the address of the symbol as the initial address of the function corresponding to the symbol;
and acquiring a starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file.
Optionally, the step of confirming that the class belongs to a useful class if the address of the class is found in the instruction set includes:
if the address of the class is found in the instruction set, detecting whether the symbol is consistent with the symbol corresponding to the class;
confirming that the class is for a useful class if the symbol is not consistent with a symbol corresponding to the class.
Optionally, the step of disassembling the code segments of the executable program file to obtain a symbolized instruction set includes:
analyzing the executable program file to obtain the address and the length of a code segment of the executable program file;
disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set.
Optionally, the step of obtaining the garbage code of the application program according to the reference class set and the all class list includes:
traversing each class in the all class lists, and putting each class in the all class lists into a class set;
putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set;
and acquiring a difference set of the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code.
Optionally, the executable program file includes an executable program file without a symbol table being stripped, and the executable program file is a file in a Mach-O format.
In a second aspect, an embodiment of the present invention provides an apparatus for detecting a garbage code, including:
the executable program file acquisition module is used for acquiring an executable program file of the application program to be detected;
a class list obtaining module, configured to obtain a start class list, a reference class list, and a full class list in the executable program file;
a reference class set obtaining module, configured to obtain a reference class set of the application program according to the startup class list and the reference class list;
and the useless code acquisition module is used for acquiring the useless codes of the application program according to the reference class set and the all class list.
Optionally, the reference class set obtaining module includes:
the instruction set acquisition submodule is used for disassembling the code segments of the executable program file to obtain a symbolized instruction set;
a symbol table obtaining submodule for obtaining a symbol table of the executable program file;
a reference class list checking submodule, configured to check each class in the reference class list according to the instruction set and the symbol table;
and the reference class set acquisition submodule is used for acquiring the reference class set of the application program according to the verified reference class list and the starting class list.
Optionally, the reference class list checking submodule includes:
a starting instruction subscript acquiring unit, configured to traverse each symbol in the symbol table for each class in the reference class list, and if the type of the symbol is a function type, acquire a starting instruction subscript of the symbol in the instruction set;
a class address searching unit, configured to search, starting from an instruction corresponding to the starting instruction subscript, an address of the class in the instruction set until an instruction is ended;
a useful class confirmation unit for confirming that the class is for a useful class if the address of the class is found in the instruction set.
Optionally, the starting instruction subscript obtaining unit includes:
the address acquisition subunit is configured to acquire an address of the symbol, where the address is used as an initial address of a function corresponding to the symbol;
and the starting instruction subscript acquiring subunit is used for acquiring the starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file.
Optionally, the useful class confirmation unit includes:
a symbol detection subunit, configured to detect whether the symbol is consistent with a symbol corresponding to the class if the address of the class is found in the instruction set;
a useful class confirming subunit, configured to confirm that the class belongs to the useful class if the symbol is not consistent with the symbol corresponding to the class.
Optionally, the instruction set obtaining sub-module includes:
the code segment identification unit is used for analyzing the executable program file and acquiring the address and the length of a code segment of the executable program file;
and the instruction disassembling unit is used for disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set.
Optionally, the garbage code obtaining module includes:
the class set construction submodule is used for traversing each class in the all class lists and putting each class in the all class lists into a class set;
the reference class set updating submodule is used for putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set;
and the useless code obtaining submodule is used for obtaining the difference between the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code.
Optionally, the executable program file includes an executable program file without a symbol table being stripped, and the executable program file is a file in a Mach-O format.
In a third aspect, an embodiment of the present invention additionally provides a terminal device, including: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of detecting a garbage code as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the method for detecting a garbage code as described above.
In the embodiment of the invention, the executable program file of the application program to be detected is acquired; acquiring a starting class list, a reference class list and a whole class list in the executable program file; acquiring a reference class set of the application program according to the starting class list and the reference class list; and acquiring the useless codes of the application program according to the reference class set and the all class list. The technical problems that an existing useless code detection mode is inconvenient to operate, low in adaptability, easy to carry out error statistics and low in accuracy are solved. The method has the advantages of improving the convenience of detecting the useless codes and the accuracy of the detection result.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.
FIG. 1 is a flow chart of the steps of a method of detecting a garbage code in an embodiment of the invention;
FIG. 2 is a flow chart of steps in another method of detecting garbage in an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus for detecting a garbage code according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another apparatus for detecting a garbage code according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of a terminal device in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart illustrating steps of a method for detecting a garbage code according to an embodiment of the present invention is shown.
Step 110, obtaining an executable program file of the application program to be detected.
In the embodiment of the invention, in order to improve the accuracy of the detection result of the useless code, the useful code of the application program can be obtained by scanning the executable program file of the application program, and then the useful code is removed from all the codes, so that the useless code of the application program is obtained.
And acquiring a class set and a reference class set of the application program, wherein the class set comprises all classes of the application program currently detected, the reference class set can comprise all useful classes, and the difference between the class set and the reference class set is a useless class set, so that the codes of the useless classes are useless codes.
Therefore, in the embodiment of the present invention, first, the executable program file of the application program to be currently detected needs to be acquired. In the embodiment of the present invention, the executable program file of the application program may be obtained in any available manner, and the embodiment of the present invention is not limited thereto.
For example, the source code of the application may be locally packaged by XCode, an executable program file without a symbol table being stripped is generated, and the like. Furthermore, in the embodiment of the present invention, the executable program file may be in any available file format, and the embodiment of the present invention is not limited thereto. For example, in the iOS operating system, the executable program file may be a file in a Mach-O format, and at this time, a developer may parse the executable program file according to the file format of the Mach-O to obtain the binary file content therein.
Step 120, acquiring a start class list, a reference class list and a whole class list in the executable program file.
After obtaining the executable program file of the application program, in order to obtain the useless code thereof, the startup class list, the reference class list and the whole class list in the executable program file can be further obtained.
Furthermore, in the embodiment of the present invention, the start class list, the reference class list, and the entire class list in the executable program file may be obtained in any available manner, which is not limited in this embodiment of the present invention.
For example, if the executable program file is a file in Mach-O format, parsing according to Mach-O format may read any information in the executable program file. The __ obj c nlcllist __ DATA field information in the executable file can be parsed and read in Mach-O format to get the list of start classes.
In practical applications, __ objc _ nlcllslist __ DATA stores a set of classes for all calls + load (load) methods of the application. There are many classes in an application that register themselves to other data by + load, and thus these classes are all useful. The classes contained in __ obj c nlcllist __ DATA can thus be obtained as part of the set of reference classes. At this time, __ obj c nlcllist __ DATA can be understood as the above list of start classes, which contains all classes that call the + load method.
Similarly, __ objc _ classrefs __ DATA holds the set of all classes called to in the code. Such as [ MyClass new ], are all recorded in __ objc _ classrefs __ DATA. The list of referenced classes can be obtained from __ objc _ classrefs __ DATA segment information as part of the set of referenced classes.
__ obj _ classlst __ DATA stores a set of all classes, and __ obj _ classlst __ DATA can acquire a class, a parent class of the class, and a member variable type of the class. Therefore, __ obj _ classlst __ DATA can be obtained from the executable program file, and the list of all classes of the executable program file, that is, the list of all classes containing the executable program file, can be obtained.
Step 130, acquiring the reference class set of the application program according to the starting class list and the reference class list.
As described above, in the embodiment of the present invention, in order to detect a useless code, a useful code may be detected first, and a useful code may be removed from all codes, so as to obtain a useless code. In order to obtain useful codes, useful classes can be obtained first, and at this time, only the useful classes can be removed from all classes, and then useless classes are obtained, so that the codes of the useless classes are useless codes.
At this time, since all the boot class list and the reference class list obtained in the above steps are useful classes, the reference class set of the application program can be obtained according to the boot class list and the reference class list. All useful classes currently identified may be contained in the set of reference classes.
Specifically, the startup class list and the reference class list may contain duplicate classes, and then the classes in the startup class list and the reference class list may be merged and then deduplicated, so as to remove the duplicate classes. Of course, in the embodiment of the present invention, the code corresponding to each class is determined, so that even though the start class list and the reference class list may include duplicate classes, the corresponding codes are all fixed, and therefore, the deduplication processing may not be performed, and the detection result of the useless code will not be affected.
Moreover, in practical applications, it may happen that one class is called by itself, but not by other classes. For example, if there is a code in the MyClass class like [ MyClass new ], there is no MyClass call in other classes, and the MyClass class is still recorded in __ obj jc _ classrefs __ DATA, but the MyClass class is not a useful class at this time. Therefore, in the embodiment of the present invention, in order to improve the accuracy of the reference class set, the classes included in the reference class list may also be filtered to remove the classes in which only the classes are called by themselves. In particular, the classes included in the list of reference classes may be filtered by any available method, and the embodiment of the present invention is not limited thereto. And further, the classes contained in the filtered startup class list and the classes contained in the reference class list can be merged, so that the reference class set of the corresponding application program can be obtained.
Or, if the influence of the class called by the self on the reference class set is not considered, the class contained in the start class list and the class contained in the reference class list can be directly merged, so that the reference class set of the corresponding application program can be obtained; or, the reference class set and the startup class list may be customized according to requirements, and the relationship between the reference class set and the startup class list may not be limited in this embodiment of the present invention.
And 140, acquiring the useless codes of the application program according to the reference class set and the all class list.
As described above, __ obj c _ classlst __ DATA stores a set of all classes, so that all classes of the application to be detected currently, including useful classes and useless classes, are included in the whole class list, and after obtaining the reference class set of the application, the useless classes of the corresponding application can be obtained according to the reference class set and the whole class list, that is, all classes included in the reference class set can be removed from the whole class list, so that the remaining classes are useless classes, and after obtaining the useless classes, the codes of the useless classes are useless codes of the corresponding application.
Furthermore, in the embodiment of the present invention, the code of the useless class may be obtained in any available manner, and the embodiment of the present invention is not limited thereto. For example, the code segments for each garbage class may be identified from the source code of the application based on the identification of each garbage class and marked as garbage code or copied into garbage code text, and so on.
In the embodiment of the invention, the executable program file of the application program to be detected is acquired; acquiring a starting class list, a reference class list and a whole class list in the executable program file; acquiring a reference class set of the application program according to the starting class list and the reference class list; and acquiring the useless codes of the application program according to the reference class set and the all class list. Therefore, the convenience of detecting the useless codes and the accuracy of the detection result can be improved.
Referring to fig. 2, in an embodiment of the present invention, the step 130 may further include:
131, disassembling the code segments of the executable program file to obtain a symbolized instruction set;
step 132, obtaining a symbol table of the executable program file;
step 133, checking each class in the list of reference classes according to the instruction set and the symbol table;
and step 134, acquiring a reference class set of the application program according to the verified reference class list and the startup class list.
As described above, the list of reference classes may include only the garbage classes that call itself, which may affect the accuracy of the detection result of the garbage codes, and therefore, in order to improve the accuracy of the detection result of the garbage codes, each class in the list of reference classes may be checked to determine whether it is a useful class or not, and whether it can be placed in the set of reference classes or not.
In order to check each class in the reference class list, whether the corresponding class is called needs to be checked first, then other detection objects which can call the class need to be obtained first, and because the executable program file is a compiled file, the content of the executable program file is not easy to recognize, in the embodiment of the present invention, the code segments of the corresponding executable program file can be disassembled to obtain a symbolized instruction set. Wherein each element in the symbolized instruction set corresponds to an assembly instruction.
For example, assuming that the executable program file is a file in a Mach-O format, the executable program file may be parsed in the Mach-O format to obtain the address and size of __ text segments therein, and then the assembler instructions within the scope may be disassembled by any available tool such as capstone. The final result is a set of symbolized instructions.
In addition, a symbol table of the executable program file may also be acquired. In computer science, a symbol table is a data structure used in language translators (e.g., compilers and interpreters). In the symbol table, each identifier in the program source code is bound with its declaration or usage information, such as its data type, scope, and memory address. That is, information such as the address and type of each identifier in the program source code can be acquired through the symbol table.
Then, in the embodiment of the present invention, each class in the list of referenced classes may be checked according to the instruction set and the symbol table. Specifically, for each identifier in the symbol table, the code segment corresponding to the identifier in the instruction set is obtained according to the address of the identifier, and then whether the code segment corresponding to the identifier calls a class in the reference class list is detected, and if the code segment corresponding to the identifier calls the class, the corresponding called class can be determined as a useful class; and regarding the classes in the reference class list which are not called by any code segment corresponding to the identifier, the classes can be regarded as useless classes.
By checking each class in the list of the reference classes, useless classes in the list of the reference classes can be further eliminated, and the accuracy of the checked list of the reference classes is improved. And further, acquiring the reference class set of the application program according to the verified reference class list and the startup class list. For example, the classes included in the startup class list and the useful classes included in the verified reference class list may be merged to construct a reference class set of the corresponding application, and so on.
Optionally, in an embodiment of the present invention, the step 133 further includes:
step 1331, for each class in the reference class list, traversing each symbol in the symbol table, and if the type of the symbol is a function type, acquiring a starting instruction subscript of the symbol in the instruction set;
step 1332, starting from the instruction corresponding to the starting instruction subscript, searching the address of the class in the instruction set until finishing the instruction;
step 1333, if the address of the class is found in the instruction set, confirming that the class is for a useful class.
In practical applications, there may be class calling relationships in the code fragments of the symbols of the function types, while other types of symbols generally do not involve class calling. Therefore, in the embodiment of the present invention, in order to improve the checking efficiency of each class in the reference class list, the checking of the class call may be performed only for the symbol of the function type.
Specifically, each symbol in the symbol table may be traversed for each class in the reference class list, and if the type of the symbol is a function type, the segment corresponding to the symbol is an __ text segment, then the starting instruction index of the symbol in the instruction set may be obtained, and then the address of the class may be searched in the instruction set starting from the instruction corresponding to the starting instruction index until the instruction is ended. If the address of any one class in the reference class list is found in the instruction set, confirming that the currently found class is used for the useful class; if the address of any class in the reference class list is found in the instruction set, the class is confirmed to belong to a useless class
In this embodiment of the present invention, the starting instruction index of the symbol in the instruction set may be obtained in any available manner, which is not limited in this embodiment of the present invention. For example, the address of the symbol may be obtained in the symbol table, and then the starting instruction subscript of the symbol in the instruction set corresponding to the executable program file may be obtained according to the address of the symbol and the address of the code segment of the executable program file of the corresponding application program.
For example, for any class a in the reference class list, assuming that the symbol of the function type in the symbol table includes symbols B1 and B2, at this time, the starting instruction index of the symbol B1 in the instruction set may be obtained, and then, starting with the instruction corresponding to the starting instruction index of the symbol B1 in the instruction set, the address of class a is searched in the instruction set until a finish (ret) instruction is encountered; correspondingly, the beginning instruction index of the symbol B2 in the instruction set may also be obtained, and then the address of class a is searched in the instruction set starting with the instruction corresponding to the beginning instruction index of the symbol B2 in the instruction set until a finish (ret) instruction is encountered; if the address of the class A is found in any at least one mode, the class A can be confirmed to be a useful class; if the instruction corresponding to the initial instruction subscript in the instruction set starts based on the symbol of each function type, the address of class a cannot be found in the instruction set, and the corresponding class a can be determined to be a useless class.
In addition, in the embodiment of the present invention, when traversing each symbol in the symbol table, if a certain class can be confirmed as a useful class based on the currently traversed symbol, the symbol table may not be continuously traversed for the class confirmed as the useful class.
For example, if the symbol corresponding to any class a in the reference class list and the symbol of the function type in the symbol table includes symbols B1 and B2, if the address of class a is found in the instruction set starting with the instruction corresponding to the starting instruction subscript in the instruction set with the symbol B1, it may be determined that class a is a useful class, at this time, the starting instruction subscript in the instruction set with the symbol B2 may not be obtained any more, and then the address of class a may be found in the instruction set starting with the instruction corresponding to the starting instruction subscript in the instruction set with the symbol B2, but may jump directly to the next class in the reference class list, and the above step 1 and 1333 may be executed, so as to determine whether the next class is a useful class.
The starting instruction index therein may be understood as the index of the instruction element in the instruction set at which the lookup is initiated.
For example, assume that the instruction set includes n instruction elements, and the subscript of each instruction element is 1, 2. Then if a certain symbol B1 is denoted by i at the beginning instruction index in the instruction set, the instruction element denoted by i and the instruction elements following the instruction element denoted by i may be searched for addresses of class a until a ret instruction is encountered, starting with the instruction element denoted by i in the instruction set, according to the sequence of each instruction element in the instruction set.
Optionally, in an embodiment of the present invention, the step 1331 may further include:
step S1, obtaining the address of the symbol as the initial address of the function corresponding to the symbol;
step S2, obtaining a start instruction subscript of the symbol in the instruction set according to the start address and the start address of the code segment of the executable program file.
In the embodiment of the present invention, in order to accurately obtain the initial instruction subscript of each symbol in the instruction set, an address of the symbol may be obtained as an initial address of a function corresponding to the corresponding symbol, and further, the initial instruction subscript of the corresponding symbol in the instruction set may be obtained according to the initial address of the corresponding function and an initial address of a code segment of the executable program file.
For example, the starting address of the function corresponding to the symbol-the starting address of the code segment)/4 may be used as the starting instruction index of the function corresponding to the symbol in the reversely compiled instruction set.
Optionally, in an embodiment of the present invention, the step 1333 may further include:
step T1, if the address of the class is found in the instruction set, detecting whether the symbol is consistent with the symbol corresponding to the class;
step T2, confirming that the class is for a useful class if the symbol does not coincide with the symbol corresponding to the class.
In addition, the searched class is prevented from calling the self, so that the accuracy of the finally confirmed useful class is influenced. In the embodiment of the present invention, for a symbol of any function type, if an address of any class in a reference class list is found in an instruction set starting with an instruction corresponding to a starting instruction subscript of the symbol, it may be further detected whether the symbol is consistent with a symbol corresponding to a currently found class, if so, it may be considered that the current class calls itself, and if not, it may be determined that the corresponding class calls in other classes, and the corresponding class may be placed in the reference class set as a useful class.
Optionally, in an embodiment of the present invention, the step 131 further includes:
step 1311, parsing the executable program file, and acquiring an address and a length of a code segment of the executable program file;
step 1312, disassembling the assembly instructions within the address and length ranges of the code segments to obtain a symbolized instruction set.
In addition, in the embodiment of the present invention, in order to obtain a symbolized instruction set, the executable program file may be analyzed, and an address and a length of a code segment of the executable program file may be obtained, so that assembly instructions within a range of the address and the length of the code segment may be disassembled, and the symbolized instruction set may be obtained.
In the embodiment of the present invention, the executable program file may be parsed in any available manner to obtain the address and the length of the code segment of the executable program file, which is not limited in this embodiment of the present invention. For example, assuming that the executable program file is in Mach-O format, the address and size of the text segment, i.e., the data length, can be obtained __ by parsing the file according to the format. Therefore, the assembly instruction where the code segment is located can be found according to the address and the length of the code segment, and then the corresponding assembly instruction can be disassembled, so that a symbolized instruction set is obtained. At this point, each element in the instruction set corresponds to an assembly instruction.
Moreover, in the embodiments of the present invention, the disassembly may be performed by any available means or tools, and the embodiments of the present invention are not limited thereto.
Referring to fig. 2, in an embodiment of the present invention, the step 140 may further include:
step 141, traversing each class in the all class lists, and putting each class in the all class lists into a class set;
step 142, putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set;
step 143, obtaining the difference between the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code.
In practical applications, if a class corresponds to a parent class, the parent class at this time may be considered as a useful class, and correspondingly, if a member variable of a class corresponds to one or more classes, the class corresponding to the member variable of the class may also be considered as a useful class.
Therefore, in the embodiment of the present invention, in order to improve the accuracy of the useful class, each class in the current whole class list may be traversed, each class included in the whole class list may be placed in the class set, and meanwhile, the parent class of the class and the class corresponding to the member variable of the class may also be placed in the reference class set. Thus, the comprehensiveness and accuracy of useful classes contained in the reference class set can be improved.
As previously described, __ obj _ classlst __ DATA stores a collection of all classes. __ obj _ classlst __ DATA sets can be retrieved to a class, a parent class of a class, and a member variable type of a class. Traversing __ obj _ classlst __ DATA sets, putting all classes into the class set, putting the parent class of the classes into the reference class set, and putting the classes corresponding to the member variables of the classes into the reference class set.
And further, a difference set between the class set and the reference class set can be obtained to obtain a useless class, and codes of the useless class are used as useless codes of the current application program to be detected.
Optionally, in the embodiment of the present invention, the executable program file includes an executable program file whose symbol table is not stripped, and the executable program file is a file in a Mach-O format.
In the embodiment of the present invention, in order to obtain the symbol table conveniently, the symbol table may be obtained at the same time when the executable program file of the application program is obtained, and then the obtained executable program file is the executable program file of which the symbol table is not stripped.
Furthermore, in the embodiment of the present invention, the executable program file without the symbol table being stripped may be obtained in any available manner, and the embodiment of the present invention is not limited thereto. For example, the source code of an application may be locally packaged by Xcode, an executable file with an un-stripped symbol table generated, and so on.
In addition, in the embodiment of the present invention, after the executable program file of the application program is obtained, in order to execute the above steps to obtain the garbage code, the executable program file may be loaded into a corresponding execution carrier, and a specific execution carrier may be set by a user according to a requirement, which is not limited in the embodiment of the present invention. For example, the executable program file may be read into memory, and so on.
In addition, if the file format supported by the application program in the iOS operating system is the Mach-O format, at this time, the developer can analyze the executable file according to the Mach-O file format to obtain the binary file content. Therefore, in order to conveniently execute the steps to acquire the useless codes, the executable program file can be set to be a file in a Mach-O format. Of course, in other operating systems, the format of the executable program file may also be set according to a file format supported by the operating system, and the embodiment of the present invention is not limited thereto.
In the embodiment of the invention, a symbolized instruction set is obtained by disassembling the code segments of the executable program file; obtaining a symbol table of the executable program file; checking each class in the reference class list according to the instruction set and the symbol table; and acquiring the reference class set of the application program according to the verified reference class list and the starting class list. The accuracy of the obtained reference class set can be improved, and the accuracy of the detection result of the useless code is further improved.
Moreover, in the embodiment of the present invention, each symbol in the symbol table may also be traversed for each class in the reference class list, and if the type of the symbol is a function type, a starting instruction subscript of the symbol in the instruction set is obtained; searching the addresses of the classes in the instruction set from the instruction corresponding to the starting instruction subscript until finishing the instruction; if the address of the class is found in the instruction set, confirming that the class is for a useful class. Acquiring the address of the symbol as the initial address of the function corresponding to the symbol; and acquiring a starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file. If the address of the class is found in the instruction set, detecting whether the symbol is consistent with the symbol corresponding to the class; confirming that the class is for a useful class if the symbol is not consistent with a symbol corresponding to the class. Therefore, the accuracy of the useful class can be further improved, and the accuracy of the detection result of the useless code can be further improved.
In addition, in the embodiment of the present invention, the executable program file may be further analyzed to obtain the address and length of the code segment of the executable program file; disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set. The accuracy and the acquisition convenience of the instruction set are improved, and the feasibility of a useless code detection process and the accuracy of a detection result are improved.
Further, in the embodiment of the present invention, each class in the all-class list may also be traversed, and each class in the all-class list is placed in a class set; putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set; and acquiring a difference set of the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code. Therefore, the comprehensiveness and the accuracy of useful classes contained in the reference class set are improved, and the accuracy of a useless code detection result is further improved.
EXAMPLE III
The embodiment of the invention provides a device for detecting a useless code.
Referring to fig. 3, a schematic structural diagram of a detection apparatus for a garbage code according to an embodiment of the present invention is shown.
The detection device of the useless codes of the embodiment of the invention comprises: an executable program file acquisition module 210, a class list acquisition module 220, a reference class set acquisition module 230, and a garbage acquisition module 240.
The functions of the modules and the interaction relationship between the modules are described in detail below.
An executable program file obtaining module 210, configured to obtain an executable program file of the application to be detected.
A class list obtaining module 220, configured to obtain a start class list, a reference class list, and a whole class list in the executable program file.
A reference class set obtaining module 230, configured to obtain a reference class set of the application according to the startup class list and the reference class list.
And a garbage code obtaining module 240, configured to obtain a garbage code of the application according to the reference class set and the all class list.
In the embodiment of the invention, the executable program file of the application program to be detected is acquired; acquiring a starting class list, a reference class list and a whole class list in the executable program file; acquiring a reference class set of the application program according to the starting class list and the reference class list; and acquiring the useless codes of the application program according to the reference class set and the all class list. Therefore, the convenience of detecting the useless codes and the accuracy of the detection result can be improved.
Referring to fig. 4, in the embodiment of the present invention, the reference class set obtaining module 230 further includes:
an instruction set obtaining sub-module 231, configured to disassemble code segments of the executable program file to obtain a symbolized instruction set;
a symbol table obtaining submodule 232, configured to obtain a symbol table of the executable program file;
a reference class list checking submodule 233, configured to check each class in the reference class list according to the instruction set and the symbol table;
and the reference class set obtaining sub-module 234 is configured to obtain the reference class set of the application program according to the verified reference class list and the startup class list.
Optionally, in this embodiment of the present invention, the reference class list checking submodule 233 may further include:
a starting instruction subscript acquiring unit, configured to traverse each symbol in the symbol table for each class in the reference class list, and if the type of the symbol is a function type, acquire a starting instruction subscript of the symbol in the instruction set;
a class address searching unit, configured to search, starting from an instruction corresponding to the starting instruction subscript, an address of the class in the instruction set until an instruction is ended;
a useful class confirmation unit for confirming that the class is for a useful class if the address of the class is found in the instruction set.
Optionally, in this embodiment of the present invention, the starting instruction subscript obtaining unit further includes:
the address acquisition subunit is configured to acquire an address of the symbol, where the address is used as an initial address of a function corresponding to the symbol;
and the starting instruction subscript acquiring subunit is used for acquiring the starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file.
Optionally, in an embodiment of the present invention, the useful class confirming unit further may include:
a symbol detection subunit, configured to detect whether the symbol is consistent with a symbol corresponding to the class if the address of the class is found in the instruction set;
a useful class confirming subunit, configured to confirm that the class belongs to the useful class if the symbol is not consistent with the symbol corresponding to the class.
Optionally, in this embodiment of the present invention, the instruction set obtaining sub-module 231 further includes:
the code segment identification unit is used for analyzing the executable program file and acquiring the address and the length of a code segment of the executable program file;
and the instruction disassembling unit is used for disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set.
Referring to fig. 4, in an embodiment of the present invention, the garbage code obtaining module 240 may further include:
the class set constructing submodule 241 is configured to traverse each class in the all class lists, and place each class in the all class lists into a class set;
a reference class set updating submodule 242, configured to put a parent class of each class and a class corresponding to a member variable of each class into the reference class set;
the garbage code obtaining submodule 243 is configured to obtain a difference between the class set and the reference class set to obtain a garbage class, and use a code of the garbage class as the garbage code.
Optionally, in the embodiment of the present invention, the executable program file includes an executable program file whose symbol table is not stripped, and the executable program file is a file in a Mach-O format.
In the embodiment of the invention, a symbolized instruction set is obtained by disassembling the code segments of the executable program file; obtaining a symbol table of the executable program file; checking each class in the reference class list according to the instruction set and the symbol table; and acquiring the reference class set of the application program according to the verified reference class list and the starting class list. The accuracy of the obtained reference class set can be improved, and the accuracy of the detection result of the useless code is further improved.
Moreover, in the embodiment of the present invention, each symbol in the symbol table may also be traversed for each class in the reference class list, and if the type of the symbol is a function type, a starting instruction subscript of the symbol in the instruction set is obtained; searching the addresses of the classes in the instruction set from the instruction corresponding to the starting instruction subscript until finishing the instruction; if the address of the class is found in the instruction set, confirming that the class is for a useful class. Acquiring the address of the symbol as the initial address of the function corresponding to the symbol; and acquiring a starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file. If the address of the class is found in the instruction set, detecting whether the symbol is consistent with the symbol corresponding to the class; confirming that the class is for a useful class if the symbol is not consistent with a symbol corresponding to the class. Therefore, the accuracy of the useful class can be further improved, and the accuracy of the detection result of the useless code can be further improved.
In addition, in the embodiment of the present invention, the executable program file may be further analyzed to obtain the address and length of the code segment of the executable program file; disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set. The accuracy and the acquisition convenience of the instruction set are improved, and the feasibility of a useless code detection process and the accuracy of a detection result are improved.
Further, in the embodiment of the present invention, each class in the all-class list may also be traversed, and each class in the all-class list is placed in a class set; putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set; and acquiring a difference set of the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code. Therefore, the comprehensiveness and the accuracy of useful classes contained in the reference class set are improved, and the accuracy of a useless code detection result is further improved.
The terminal device provided in the embodiment of the present invention can implement each process implemented in the method embodiments of fig. 1 to fig. 2, and is not described herein again to avoid repetition.
EXAMPLE five
Fig. 5 is a schematic diagram of a hardware structure of a terminal device for implementing various embodiments of the present invention.
The terminal device 500 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and a power supply 511. Those skilled in the art will appreciate that the terminal device configuration shown in fig. 5 does not constitute a limitation of the terminal device, and that the terminal device may include more or fewer components than shown, or combine certain components, or a different arrangement of components. In the embodiment of the present invention, the terminal device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.
It should be understood that, in the embodiment of the present invention, the radio frequency unit 501 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 510; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 501 can also communicate with a network and other devices through a wireless communication system.
The terminal device provides the user with wireless broadband internet access through the network module 502, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.
The audio output unit 503 may convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output as sound. Also, the audio output unit 503 may also provide audio output related to a specific function performed by the terminal apparatus 500 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.
The input unit 504 is used to receive an audio or video signal. The input Unit 504 may include a Graphics Processing Unit (GPU) 5041 and a microphone 5042, and the Graphics processor 5041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 506. The image frames processed by the graphic processor 5041 may be stored in the memory 509 (or other storage medium) or transmitted via the radio frequency unit 501 or the network module 502. The microphone 5042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 501 in case of the phone call mode.
The terminal device 500 further comprises at least one sensor 505, such as light sensors, motion sensors and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 5061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 5061 and/or a backlight when the terminal device 500 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the terminal device posture (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration identification related functions (such as pedometer, tapping), and the like; the sensors 505 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.
The display unit 506 is used to display information input by the user or information provided to the user. The Display unit 506 may include a Display panel 5061, and the Display panel 5061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 507 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal device. Specifically, the user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 5071 using a finger, stylus, or any suitable object or attachment). The touch panel 5071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 510, and receives and executes commands sent by the processor 510. In addition, the touch panel 5071 may be implemented in various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 5071, the user input unit 507 may include other input devices 5072. In particular, other input devices 5072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
Further, the touch panel 5071 may be overlaid on the display panel 5061, and when the touch panel 5071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 510 to determine the type of the touch event, and then the processor 510 provides a corresponding visual output on the display panel 5061 according to the type of the touch event. Although in fig. 5, the touch panel 5071 and the display 5061 are two independent components to implement the input and output functions of the terminal device, in some embodiments, the touch panel 5071 and the display 5061 may be integrated to implement the input and output functions of the terminal device, and is not limited herein.
The interface unit 508 is an interface for connecting an external device to the terminal apparatus 500. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 508 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the terminal apparatus 500 or may be used to transmit data between the terminal apparatus 500 and the external device.
The memory 509 may be used to store software programs as well as various data. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The processor 510 is a control center of the terminal device, connects various parts of the entire terminal device by using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs and/or modules stored in the memory 509 and calling data stored in the memory 509, thereby performing overall monitoring of the terminal device. Processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 510.
The terminal device 500 may further include a power supply 511 (e.g., a battery) for supplying power to various components, and preferably, the power supply 511 may be logically connected to the processor 510 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.
In addition, the terminal device 500 includes some functional modules that are not shown, and are not described in detail herein.
Preferably, an embodiment of the present invention further provides a terminal device, including: the processor 510, the memory 509, and a computer program stored in the memory 509 and capable of running on the processor 510, where the computer program, when executed by the processor 510, implements each process of the above-mentioned detection method for a useless code, and can achieve the same technical effect, and are not described herein again to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the above-mentioned embodiment of the method for detecting a garbage code, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (18)

1. A method for detecting a garbage code, comprising:
acquiring an executable program file of an application program to be detected;
acquiring a starting class list, a reference class list and a whole class list in the executable program file;
acquiring a reference class set of the application program according to the starting class list and the reference class list;
and acquiring the useless codes of the application program according to the reference class set and the all class list.
2. The method according to claim 1, wherein the step of obtaining the reference class set of the application according to the launch class list and the reference class list comprises:
disassembling code segments of the executable program file to obtain a symbolized instruction set;
obtaining a symbol table of the executable program file;
checking each class in the reference class list according to the instruction set and the symbol table;
and acquiring the reference class set of the application program according to the verified reference class list and the starting class list.
3. The method of claim 2, wherein the step of checking each class in the list of referenced classes based on the set of instructions and the symbol table comprises:
for each class in the reference class list, traversing each symbol in the symbol table, and if the type of the symbol is a function type, acquiring a starting instruction subscript of the symbol in the instruction set;
searching the addresses of the classes in the instruction set from the instruction corresponding to the starting instruction subscript until finishing the instruction;
if the address of the class is found in the instruction set, confirming that the class is for a useful class.
4. The method of claim 3, wherein the step of obtaining a starting instruction subscript of the symbol in the set of instructions comprises:
acquiring the address of the symbol as the initial address of the function corresponding to the symbol;
and acquiring a starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file.
5. The method of claim 3, wherein the step of confirming that the class is for a useful class if the address of the class is found in the set of instructions comprises:
if the address of the class is found in the instruction set, detecting whether the symbol is consistent with the symbol corresponding to the class;
confirming that the class is for a useful class if the symbol is not consistent with a symbol corresponding to the class.
6. A method according to any one of claims 2 to 5 wherein the step of disassembling a code section of the executable program file to obtain a set of symbolised instructions comprises:
analyzing the executable program file to obtain the address and the length of a code segment of the executable program file;
disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set.
7. The method of claim 1, wherein the step of obtaining the application's garbage code from the set of referenced classes and the list of full classes comprises:
traversing each class in the all class lists, and putting each class in the all class lists into a class set;
putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set;
and acquiring a difference set of the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code.
8. The method according to claim 1, wherein the executable program file comprises an executable program file without stripping a symbol table, and the executable program file is a file in a Mach-O format.
9. An apparatus for detecting a garbage code, comprising:
the executable program file acquisition module is used for acquiring an executable program file of the application program to be detected;
a class list obtaining module, configured to obtain a start class list, a reference class list, and a full class list in the executable program file;
a reference class set obtaining module, configured to obtain a reference class set of the application program according to the startup class list and the reference class list;
and the useless code acquisition module is used for acquiring the useless codes of the application program according to the reference class set and the all class list.
10. The apparatus of claim 9, wherein the reference class set obtaining module comprises:
the instruction set acquisition submodule is used for disassembling the code segments of the executable program file to obtain a symbolized instruction set;
a symbol table obtaining submodule for obtaining a symbol table of the executable program file;
a reference class list checking submodule, configured to check each class in the reference class list according to the instruction set and the symbol table;
and the reference class set acquisition submodule is used for acquiring the reference class set of the application program according to the verified reference class list and the starting class list.
11. The apparatus of claim 10, wherein the list of referenced classes check submodule comprises:
a starting instruction subscript acquiring unit, configured to traverse each symbol in the symbol table for each class in the reference class list, and if the type of the symbol is a function type, acquire a starting instruction subscript of the symbol in the instruction set;
a class address searching unit, configured to search, starting from an instruction corresponding to the starting instruction subscript, an address of the class in the instruction set until an instruction is ended;
a useful class confirmation unit for confirming that the class is for a useful class if the address of the class is found in the instruction set.
12. The apparatus of claim 11, wherein the starting instruction subscript fetch unit comprises:
the address acquisition subunit is configured to acquire an address of the symbol, where the address is used as an initial address of a function corresponding to the symbol;
and the starting instruction subscript acquiring subunit is used for acquiring the starting instruction subscript of the symbol in the instruction set according to the starting address and the starting address of the code segment of the executable program file.
13. The apparatus of claim 11, wherein the useful class confirmation unit comprises:
a symbol detection subunit, configured to detect whether the symbol is consistent with a symbol corresponding to the class if the address of the class is found in the instruction set;
a useful class confirming subunit, configured to confirm that the class belongs to the useful class if the symbol is not consistent with the symbol corresponding to the class.
14. The apparatus of any one of claims 10-13, wherein the instruction set fetch submodule comprises:
the code segment identification unit is used for analyzing the executable program file and acquiring the address and the length of a code segment of the executable program file;
and the instruction disassembling unit is used for disassembling the assembly instructions in the address and length ranges of the code segments to obtain a symbolized instruction set.
15. The apparatus of claim 9, wherein the garbage acquisition module comprises:
the class set construction submodule is used for traversing each class in the all class lists and putting each class in the all class lists into a class set;
the reference class set updating submodule is used for putting the parent class of each class and the class corresponding to the member variable of each class into the reference class set;
and the useless code obtaining submodule is used for obtaining the difference between the class set and the reference class set to obtain a useless class, and taking the code of the useless class as the useless code.
16. The apparatus according to claim 9, wherein the executable program file includes an executable program file whose symbol table is not stripped, and the executable program file is a file in a Mach-O format.
17. A terminal device, comprising: memory, processor and computer program stored on said memory and executable on said processor, said computer program, when executed by said processor, implementing the steps of the method of detecting a garbage code according to any one of claims 1 to 8.
18. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method of detecting a garbage code according to any one of claims 1 to 8.
CN201911205950.7A 2019-11-29 2019-11-29 Detection method and device of useless codes, terminal equipment and storage medium Pending CN110879709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911205950.7A CN110879709A (en) 2019-11-29 2019-11-29 Detection method and device of useless codes, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911205950.7A CN110879709A (en) 2019-11-29 2019-11-29 Detection method and device of useless codes, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110879709A true CN110879709A (en) 2020-03-13

Family

ID=69729814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911205950.7A Pending CN110879709A (en) 2019-11-29 2019-11-29 Detection method and device of useless codes, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110879709A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596739A (en) * 2020-12-17 2021-04-02 北京五八信息技术有限公司 Data processing method and device
CN113946346A (en) * 2021-09-30 2022-01-18 北京五八信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881274A (en) * 2014-02-28 2015-09-02 上海斐讯数据通信技术有限公司 Method for identifying useless codes
US20160321036A1 (en) * 2015-04-28 2016-11-03 Box, Inc. Dynamically monitoring code execution activity to identify and manage inactive code
CN107193732A (en) * 2017-05-12 2017-09-22 北京理工大学 A kind of verification function locating method compared based on path
CN108132790A (en) * 2017-12-22 2018-06-08 广州酷狗计算机科技有限公司 Detect the method, apparatus and computer storage media of dead code
CN108549538A (en) * 2018-04-11 2018-09-18 深圳市腾讯网络信息技术有限公司 A kind of code detection method, device, storage medium and test terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881274A (en) * 2014-02-28 2015-09-02 上海斐讯数据通信技术有限公司 Method for identifying useless codes
US20160321036A1 (en) * 2015-04-28 2016-11-03 Box, Inc. Dynamically monitoring code execution activity to identify and manage inactive code
CN107193732A (en) * 2017-05-12 2017-09-22 北京理工大学 A kind of verification function locating method compared based on path
CN108132790A (en) * 2017-12-22 2018-06-08 广州酷狗计算机科技有限公司 Detect the method, apparatus and computer storage media of dead code
CN108549538A (en) * 2018-04-11 2018-09-18 深圳市腾讯网络信息技术有限公司 A kind of code detection method, device, storage medium and test terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596739A (en) * 2020-12-17 2021-04-02 北京五八信息技术有限公司 Data processing method and device
CN112596739B (en) * 2020-12-17 2022-03-04 北京五八信息技术有限公司 Data processing method and device
CN113946346A (en) * 2021-09-30 2022-01-18 北京五八信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108089977B (en) Application program exception handling method and device and mobile terminal
CN111142930B (en) Installation package file packaging method and device, terminal device and storage medium
CN109857403B (en) Page updating method and device, page processing method and device
CN106502703B (en) Function calling method and device
CN111381992B (en) Crash log processing method and device, electronic equipment and storage medium
CN106295353B (en) Engine vulnerability detection method and detection device
CN107908407B (en) Compiling method and device and terminal equipment
CN108073495B (en) Method and device for positioning crash reason of application program
CN110378107B (en) Method and related device for detecting installation package
CN111723002A (en) Code debugging method and device, electronic equipment and storage medium
CN106959859A (en) The call method and device of system call function
CN110879709A (en) Detection method and device of useless codes, terminal equipment and storage medium
CN106919458B (en) Method and device for Hook target kernel function
CN110928797B (en) Code coupling detection method and device, terminal equipment and storage medium
CN112650530A (en) Multi-class library integration method and device, electronic equipment and readable storage medium
CN111966373B (en) APN resetting method, terminal equipment and storage medium
CN106230919B (en) File uploading method and device
CN105528220B (en) Method and device for loading dynamic shared object
CN112084104A (en) Abnormity testing method and device
CN115904367A (en) Front-end scaffold processing method and device, electronic equipment and storage medium
CN112230980A (en) Dependency relationship detection method and device, electronic equipment and storage medium
CN109240986B (en) Log processing method and device and storage medium
CN109325003B (en) Application program classification method and system based on terminal equipment
CN106657278B (en) Data transmission method and device and computer equipment
CN107066116B (en) Character string generation method, character analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200313