CN114168957A

CN114168957A - Method, apparatus, device, medium, and program product for resolving malicious application

Info

Publication number: CN114168957A
Application number: CN202111502673.3A
Authority: CN
Inventors: 钱维正; 牟天宇; 金驰; 叶红
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-11

Abstract

The disclosure provides a malicious software analysis method which can be applied to the field of information security. The method for analyzing the malicious application program comprises the following steps: obtaining a plurality of unknown instructions of a malicious application program and a plurality of known instructions of a white sample application program; obtaining a first calling frequency of a plurality of unknown instructions called by the malicious application program in the running process and a second calling frequency of a plurality of known instructions called by the white sample application program in the running process by utilizing statistics; determining at least one target instruction matched with the plurality of known instructions in the plurality of unknown instructions according to the first calling frequency and the second calling frequency; and analyzing the at least one target instruction to obtain operation related data of the malicious application program. The present disclosure also provides a malware resolving device, apparatus, storage medium, and program product.

Description

Method, apparatus, device, medium, and program product for resolving malicious application

Technical Field

The present disclosure relates to the field of information security, and in particular, to the field of information security for mobile-side applications, and more particularly, to a method, an apparatus, a device, a medium, and a program product for parsing a malicious application.

Background

In the process of security analysis of the mobile-side application program, developers often encounter malicious application programs reinforced by vmp (virtual Machine detect). The dex file of the reinforced malicious application program is not readable, so that the malicious behavior of the malicious application program cannot be analyzed.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a method, apparatus, device, medium, and program product for resolving malicious applications.

According to a first aspect of the present disclosure, there is provided a method of resolving a malicious application, comprising: obtaining a plurality of unknown instructions of a malicious application program and a plurality of known instructions of a white sample application program; obtaining a first calling frequency of the unknown instructions called by the malicious application program in the running process and a second calling frequency of the known instructions called by the white sample application program in the running process by utilizing statistics; determining at least one target instruction matched with the plurality of known instructions in the plurality of unknown instructions according to the first calling frequency and the second calling frequency; and analyzing the at least one target instruction to obtain operation related data of the malicious application program.

According to an embodiment of the present disclosure, the determining at least one target instruction of the plurality of unknown instructions that matches the plurality of known instructions according to the first call frequency and the second call frequency includes: determining a first ordering in which the plurality of unknown instructions are called according to the first calling frequency, and determining a second ordering in which the plurality of known instructions are called according to the second calling frequency; and determining at least one target instruction in the unknown instructions that matches the plurality of known instructions according to the first ordering and the second ordering; wherein the white sample application and the malicious application are the same type of application.

According to an embodiment of the present disclosure, the obtaining, by using statistics, a first call frequency at which the unknown instructions are called in the running process of the malicious application includes: acquiring respective instruction addresses of the unknown instructions to obtain a plurality of instruction addresses; respectively setting interrupt points at positions corresponding to the instruction addresses in the malicious application program; obtaining the calling data of the unknown instructions through the interruption point; and counting to obtain a first calling frequency of the plurality of unknown instructions to be called according to the calling data.

According to an embodiment of the present disclosure, the obtaining the respective instruction addresses of the plurality of unknown instructions to obtain a plurality of instruction addresses includes: obtaining an interpreter of the malicious application program; analyzing the executable file of the malicious application program through the interpreter to obtain code data corresponding to the unknown instructions in the executable file; and analyzing the code data to obtain the plurality of instruction addresses.

According to an embodiment of the present disclosure, the obtaining the interpreter of the malicious application includes: calling the unknown instructions to obtain calling addresses of the unknown instructions; and determining the interpreter according to the calling address.

According to an embodiment of the present disclosure, the obtaining a plurality of unknown instructions of a malicious application includes: and under the condition that the malicious application program comprises a reinforcement strategy, analyzing the reinforcement strategy to obtain the unknown instructions.

According to an embodiment of the present disclosure, the obtaining a plurality of unknown instructions of a malicious application further includes: acquiring an installation package of the malicious application program; in the case that the installation package is determined to comprise a code extraction shell, performing shell removal processing on the installation package; and analyzing the unshelled installation package to obtain the plurality of unknown instructions.

A second aspect of the present disclosure provides an apparatus for resolving a malicious application, including: the acquisition module is used for acquiring a plurality of unknown instructions of the malicious application program and a plurality of known instructions of the white sample application program; the statistical module is used for obtaining a first calling frequency of the unknown instructions called by the malicious application program in the running process and a second calling frequency of the known instructions called by the white sample application program in the running process by utilizing statistics; the determining module is used for determining at least one target instruction which is matched with the plurality of known instructions in the plurality of unknown instructions according to the first calling frequency and the second calling frequency; and the analysis module is used for analyzing the at least one target instruction to obtain the running related data of the malicious application program.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described method of resolving malicious applications.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method of resolving malicious applications.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method of resolving malicious applications.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario diagram of a method, apparatus, device, medium and program product for resolving malicious applications according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a schematic diagram of a method of resolving malicious applications, according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of resolving malicious applications according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow diagram for obtaining a plurality of unknown instructions for a malicious application, according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart for counting call frequencies at which a plurality of unknown instructions are invoked, according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow diagram for fetching multiple instruction addresses according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow diagram for determining a target instruction according to a call frequency according to an embodiment of the disclosure;

fig. 8 is a block diagram schematically illustrating an apparatus for resolving a malicious application according to an embodiment of the present disclosure; and

fig. 9 schematically shows a block diagram of an electronic device adapted to implement a method of resolving malicious applications according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

It should be noted that the method and apparatus for parsing malicious application provided by the present disclosure may be used in the field of information security, may also be used in the field of mobile terminal application security in the financial field, and may also be used in any field other than the financial field.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated. In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

An embodiment of the present disclosure provides a method for resolving a malicious application, including: obtaining a plurality of unknown instructions of a malicious application program and a plurality of known instructions of a white sample application program; obtaining a first calling frequency of a plurality of unknown instructions called by the malicious application program in the running process and a second calling frequency of a plurality of known instructions called by the white sample application program in the running process by utilizing statistics; determining at least one target instruction matched with the plurality of known instructions in the plurality of unknown instructions according to the first calling frequency and the second calling frequency; and analyzing the at least one target instruction to obtain operation related data of the malicious application program.

Fig. 1 schematically illustrates an application scenario diagram of a method, apparatus, device, medium, and program product for resolving malicious applications according to embodiments of the present disclosure.

As shown in fig. 1, the application scenario 100 according to this embodiment may comprise

mobile terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

mobile terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

mobile terminal device

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

mobile terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

mobile terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

mobile terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the method for resolving a malicious application provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the apparatus for resolving malicious applications provided by the embodiments of the present disclosure may be generally disposed in the server 105. The method for resolving malicious applications provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

mobile terminal devices

101, 102, 103 and/or the server 105. Correspondingly, the apparatus for resolving malicious applications provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

mobile terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The method for resolving a malicious application according to the disclosed embodiment will be described in detail with fig. 2 to 7 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a schematic diagram of a method of resolving a malicious application according to an embodiment of the present disclosure.

Since the operation process of the application program can be understood as a process in which various instructions are called and executed, the operation-related data of the application program can be analyzed according to the called situations of various instructions.

For a malicious application program subjected to VMP reinforcement processing, the installation package of the malicious application program needs to be subjected to shelling processing, and then reinforcement policy analysis is carried out, so that an unknown instruction is obtained. Since the instructions of the hardened malicious application are generally unreadable, the instructions may be considered unknown instructions.

For the unknown instruction, the unknown instruction needs to be analyzed by an interpreter of a malicious application program, so that the instruction address of the unknown instruction is determined. After determining the instruction address of the unknown instruction, counting the called condition of the unknown instruction.

In general, the same type of application calls the same instructions during runtime. The method comprises the steps of taking the called situations of a plurality of known instructions of the same type white sample application program as a reference sample, comparing the called situations of a plurality of unknown instructions of a malicious application program, and matching the unknown instructions with the known instructions one by one, so as to determine the instruction meanings of the unknown instructions. And recording the unknown instruction successfully determining the instruction meaning as a target instruction, and further analyzing the operation related data of the malicious application program according to the target instruction.

Compared with the traditional analysis method for tracking codes and analyzing the function mapping relation, the method for analyzing the malicious application program can analyze the running process of the malicious application program only through the instruction calling frequency, and the efficiency of information security analysis is effectively improved. In addition, under the condition of increasing the number of white samples, the matching precision of the instructions can be effectively improved, and the analysis efficiency of malicious application behaviors is improved.

Fig. 3 schematically shows a flowchart of a method of resolving a malicious application according to an embodiment of the present disclosure.

As shown in fig. 3, the method of resolving a malicious application of this embodiment includes operations S310 to S340.

In operation S310, a plurality of unknown instructions of a malicious application and a plurality of known instructions of a white sample application are obtained.

In the disclosed embodiments, malicious applications include illegal applications and unidentifiable applications. Since malicious applications are usually written in custom bytecode, instructions called by the malicious applications during the running process are usually unreadable unknown instructions for conventional security analysis methods.

The white sample application is a known security application and the instructions called by the white sample application during execution are known instructions that are readable. The white sample may be one or a plurality of white samples.

The malicious application used to fetch the instructions and the white sample application are the same type of application. For example, in the case where the malicious application is a mail-class application, the white sample application for obtaining known instructions is also a mail-class application. In the case where the malicious application is a music-like application, the white sample application for obtaining known instructions is also a music-like application.

In operation S320, a first calling frequency at which a plurality of unknown instructions of the malicious application are called in the running process and a second calling frequency at which a plurality of known instructions of the white sample application are called in the running process are obtained by using the statistics.

In the embodiment of the disclosure, a plurality of instructions are called during the running process of the application program. When an application runs a certain function, the related instructions are called for corresponding times at a certain time.

The first call frequency includes a frequency at which each of the plurality of unknown instructions is called. The second call frequency comprises a frequency at which each of the plurality of known instructions is called. The called frequency includes the number of times called.

In operation S330, at least one target instruction, which matches the plurality of known instructions, of the plurality of unknown instructions is determined according to the first call frequency and the second call frequency.

In the embodiment of the disclosure, the target instruction is an instruction which is matched with any one of a plurality of unknown instructions. Since the instructions called are similar when the same type of application performs the same function, the instruction meaning of the unknown instruction can be determined from the known instructions.

And determining a target instruction, matching the plurality of known instructions and the plurality of unknown instructions according to the calling frequency of the plurality of unknown instructions in the first calling frequency and the calling frequency of the plurality of known instructions in the second calling frequency, and determining an instruction matched with the known instruction from the plurality of unknown instructions.

For example, the matching may be performed according to the relative high and low of the call frequencies of the plurality of instructions, or may be performed according to the call frequencies of the plurality of instructions when different functions are executed.

In operation S340, at least one target instruction is parsed to obtain operation-related data of the malicious application.

In the embodiment of the present disclosure, the running process of the application program may be understood as a process in which various instructions are called and executed, and after the instruction meaning of an unknown instruction of a malicious application program is clarified, the execution content of the running of the malicious application program may be analyzed according to the instruction meaning.

Because the executable file of the reinforced malicious application program is not readable, the execution content of the malicious application program in operation can not be directly obtained. The method determines the called instruction when the malicious application program runs by referring to the white sample through the frequency matching method, and further analyzes the execution content of the malicious application program according to the called instruction. In addition, when the white samples are multiple, the reference samples are expanded, so that the matching accuracy of the instructions can be improved, and the analysis efficiency of the malicious application program is further improved.

FIG. 4 schematically illustrates a flow diagram for obtaining a plurality of unknown instructions for a malicious application, according to an embodiment of the disclosure.

As shown in fig. 4, operation S310 of this embodiment obtains a plurality of unknown instructions of a malicious application, including operation S410.

In operation S410, in the case that the malicious application includes a reinforcement policy, the reinforcement policy is analyzed, resulting in a plurality of unknown instructions.

In the embodiment of the disclosure, the malicious application program protects the running function of the application program through a VMP reinforcement technology, and the running of the function is realized by calling a corresponding instruction. For example, for a mail application program, the function includes a function for sending a mail, and when an instruction corresponding to the function is called, the application program performs an operation for sending the mail.

VMP consolidation policies generally include: VMP protection is carried out on a class of functions with the same registered address, for example, all OnCreate () are reinforced, and the function attribute is converted into Native from the original Java. Or, abstracting a function to be consolidated, for example, a key function for implementing a core service, into a shell function, and performing VMP protection on the shell function.

And under the condition that the malicious application program comprises the reinforcement strategy, analyzing the reinforcement strategy to obtain a reinforced function, so as to obtain an unknown instruction corresponding to the reinforced function.

In the embodiment of the present disclosure, in the case that the malicious application has a code extraction shell, operations S420 to S440 are further included before operation S410.

In operation S420, an installation package of a malicious application is acquired.

In operation S430, in case it is determined that the installation package includes the code extraction shell, an unshelling process is performed on the installation package.

In operation S440, the unshelled installation package is analyzed to obtain a plurality of unknown instructions.

In the case where the installation package is determined to include a code extraction shell, the complete executable file of the malicious application cannot be obtained until the installation package is subjected to shelling processing. The executable file comprises a dex file, and the VMP reinforcement strategy of the malicious application program can be obtained by analyzing the dex file. Analyzing the stripped installation package includes analyzing a reinforcement policy of the malicious application.

Illustratively, the code extraction shell may be subjected to a shelling process using a FART automated shelling machine.

Under the condition that the installation package of the malicious application program does not comprise a code extraction shell, a complete dex file can be directly obtained from the installation package, and then a VMP reinforcement strategy in the dex file is analyzed to obtain a plurality of unknown instructions.

Fig. 5 schematically illustrates a flowchart of counting call frequencies at which a plurality of unknown instructions are called according to an embodiment of the present disclosure.

As shown in fig. 5, in operation S320 of this embodiment, using statistics, a first call frequency, at which a plurality of unknown instructions are called during the running process of the malicious application, is obtained, which includes operations S510 to S540.

In operation S510, respective instruction addresses of a plurality of unknown instructions are obtained, and a plurality of instruction addresses are obtained.

In operation S520, interrupt points are respectively set at positions corresponding to a plurality of instruction addresses in a malicious application.

In operation S530, call data of a plurality of unknown instructions is obtained through the interrupt point.

In operation S540, a first call frequency at which a plurality of unknown instructions are called is obtained through statistics according to the call data.

In the embodiment of the disclosure, an interruption point is set at a corresponding position of the malicious application program, and when the malicious application program runs to the position, the program runs and is interrupted. After an interrupt point is set at the instruction address of each unknown instruction in the memory, when the program is interrupted, the unknown instruction is called.

After the interrupt point is set, when an instruction is called, the program runs an interrupt and outputs call data. The call data may be the bytecode associated with the instruction, the associated bytecode being recorded in the application's log. For example, the log may be a custom opcode statistics log. For an instruction, the instruction is called at different times, and after the running of the program is interrupted, the byte codes output by the application program and related to the instruction are the same. Therefore, the number of times each bytecode is recorded can be counted according to the log, and the frequency of calling each instruction can be obtained.

Before counting the calling frequency, invalid information in the counting log needs to be filtered out. For example, the custom opcode statistical logs are read one by one, system error reporting information recorded in the statistical logs is screened out, and call information of successful instruction call is reserved. By screening out invalid information, the matching efficiency and accuracy of the subsequent unknown instruction and the known instruction can be improved.

By the aid of the method and the device, after the interrupt point is set in the malicious application program, the calling condition of a plurality of unknown instructions can be recorded in the running process of the malicious application program. After the interrupt point is set in the malicious application program, the malicious application program can execute specific operation, and the calling condition of the unknown instruction in the operation process is recorded, so that the dynamic debugging of the malicious application program is realized.

FIG. 6 schematically shows a flow diagram for fetching multiple instruction addresses according to an embodiment of the disclosure.

As shown in fig. 6, operation S510 obtains a plurality of instruction addresses of a plurality of unknown instructions, and includes operations S5101 to S5103.

In operation S5101, an interpreter of a malicious application is acquired.

The interpreter is a component for realizing the processes of addressing, decoding and executing of the smali instruction set. Because the core code of the VMP-reinforced malicious application program is written by the custom byte code, the unknown instructions obtained by analysis are all unreadable messy codes. The unknown instruction needs to be decoded by the interpreter of the malicious application.

Obtaining an interpreter of a malicious application program, wherein the interpreter comprises a plurality of unknown instructions and calling addresses of the unknown instructions; and determining the interpreter according to the calling address.

Since the function reinforced by the VMP is bound in the address where the interpreter is located, the location of the interpreter can be realized by calling an unknown instruction.

Illustratively, the function of statement printing function name and calling address is added by modifying the register native method in ART source code. When the Native function is called, the name of the called Native function and the address of the called Native function are printed. When an unknown instruction corresponding to the run function reinforced by the VMP is called, the printed calling address is the address of the interpreter, so that the interpreter is positioned.

Typically, applications hardened by the VMP also include a callback policy. The anti-debug policy is used to prevent dynamic debugging of the interpreter. Therefore, before the interpreter of the malicious application is acquired in operation S5101, the anti-debugging policy of the original application needs to be bypassed by using the custom ART file. For example, code in an art-runtime-art method.cc file that calls the JNI function portion is modified. And before the JNI function is called, enabling the system of the malicious application program to be in a sleep state. When the system is in a dormant state, a debugging tool is accessed, so that subsequent dynamic debugging on the malicious application program is realized.

In operation S5102, the interpreter parses the executable file of the malicious application to obtain code data corresponding to a plurality of unknown instructions in the executable file.

In operation S5103, the code data is analyzed to obtain a plurality of instruction addresses.

In the embodiment of the disclosure, codes related to unknown instructions in the executable file can be unloaded by analyzing the executable file through the interpreter. And analyzing the unloaded codes to obtain a plurality of instruction addresses. For example, the instruction address of the instruction in the memory is determined according to the CodeItem information of the first 8 bytes in the relevant code corresponding to the instruction.

Due to VMP reinforcement processing, the instruction information of the malicious application program cannot be directly acquired. Through reinforcement policy analysis, multiple unknown instructions may also be obtained. The unknown instruction is written by the custom bytecode of the malicious application program, and for the security analysis program, the custom bytecode of the malicious application program belongs to unreadable messy code before the interpreter is not determined. After the interpreter is determined, compiling the custom byte codes of the malicious application program through the interpreter to obtain a plurality of instruction addresses.

FIG. 7 schematically illustrates a flow diagram for determining a target instruction according to a call frequency according to an embodiment of the disclosure.

As shown in fig. 7, operation S510 determines at least one target instruction of the unknown instructions that matches the known instructions according to the first call frequency and the second call frequency, including operations 710 to 720.

In operation S710, a first ordering in which the plurality of unknown instructions are called is determined according to a first call frequency, and a second ordering in which the plurality of known instructions are called is determined according to a second call frequency.

In embodiments of the present disclosure, the first ordering and the second ordering may be instruction ordering tables. The first ordering may be an ordering table obtained by ordering according to the call frequency of the unknown instructions in the first call frequency. The second ordering may be an ordering table ordered according to the call frequency of a plurality of known instructions in the second call frequency. The sorting mode can be from high to low or from low to high.

For example, in the case that the first call frequency includes call frequencies of 10 unknown instructions, the call frequencies of the 10 unknown instructions are sequentially ordered from high to low, resulting in a first order. And under the condition that the second calling frequency comprises the calling frequencies of 10 known instructions, sequencing the calling frequencies of the 10 known instructions from high to low in sequence to obtain a second sequence. The present disclosure does not limit the specific sorting manner, nor the number of instructions included in the first sort and the second sort. The number of instructions included in the first ordering may be the same as or different from the number of instructions included in the second ordering.

In operation S720, at least one target instruction of the unknown instructions that matches the plurality of known instructions is determined according to the first ordering and the second ordering.

Reference is continued to the above example. The first ordering is an ordering table comprising 10 unknown instructions, and the 10 unknown instructions are sequentially arranged from high to low according to the calling frequency. The second ordering is an ordering table comprising 10 known instructions, which are arranged in order from high to low according to the calling frequency. The 10 unknown instructions and the 10 known instructions are respectively instructions called by the malicious application program and the white sample application program when the malicious application program and the white sample application program execute the same operation.

The white sample application is the same type of application as the malicious application. Applications of the same type perform the same operation with approximately the same type and number of instructions called. But while the malicious application normally performs the specified operations, it may also perform other illegal operations without permission. For example, a malicious application program such as a mailbox may record user information and send the user information to the outside while sending and receiving mails, but the operation shown on the client is the same as that of other security software.

Illustratively, the matching mode may be that instructions with the same frequency rank are called as instructions which are matched with each other. For example, an unknown instruction with a first call frequency in the first ordering is a target instruction that matches a known instruction with a first call frequency in the second ordering, the unknown instruction (target instruction) having the same instruction meaning as the known instruction.

Illustratively, the embodiment of the disclosure also discloses another method for determining the target instruction according to the calling frequency.

And through dynamic debugging, the malicious application program and the white sample program execute the same first operation, and the calling frequency of each instruction is counted to obtain a first unknown instruction calling sequencing table and a first known instruction calling sequencing table. And then the malicious application program and the white sample program execute the same second operation, and the calling frequency of each instruction is counted to obtain a second unknown instruction calling sorting table and a second known instruction calling sorting table.

And comparing the first known instruction call sorting table with the first unknown instruction call sorting table to generate a first mapping table comprising the mapping relation between the unknown instruction and the known instruction. And comparing the second known instruction call sorting table with the second unknown instruction call sorting table to generate a second mapping table comprising the mapping relation between the unknown instruction and the known instruction. Comparing the first mapping table with the second mapping table, and under the condition that the first mapping table and the second mapping table comprise a known instruction and an unknown instruction with the same mapping relation, the unknown instruction is a target instruction matched with the known instruction.

Illustratively, the embodiment of the disclosure also discloses a method for determining the target instruction according to the calling frequency.

And through dynamic debugging, the malicious application program and the first white sample program execute the same first operation, and the calling frequency of each instruction is counted to obtain a first unknown instruction calling sequencing table and a first known instruction calling sequencing table. And then the malicious application program and the second white sample program execute the same first operation, and a second unknown instruction call sequencing table and a second known instruction call sequencing table are obtained according to the call frequency of each instruction.

The utility model provides a method for analyzing malicious application program, which realizes the analysis of the malicious application program after VMP reinforcement processing based on the frequency of the calling instruction. Compared with the traditional analysis method, the instruction meaning is determined according to the instruction calling frequency, the analysis process is simplified, a large amount of time and energy for tracking codes and analyzing the function mapping relation are saved, and the analysis efficiency of malicious application programs is effectively improved. Furthermore, matching accuracy is increased by a different way of contrast with the white sample application. The accurate matching result is more beneficial to improving the analysis efficiency of the malicious behaviors.

Based on the method for analyzing the malicious application program, the disclosure also provides a device for analyzing the malicious application program. The apparatus will be described in detail below with reference to fig. 8.

Fig. 8 schematically shows a block diagram of an apparatus for resolving a malicious application according to an embodiment of the present disclosure.

As shown in fig. 8, the apparatus 800 for resolving a malicious application according to this embodiment includes an obtaining module 810, a counting module 820, a determining module 830, and a resolving module 840.

The fetch module 810 is configured to fetch a plurality of unknown instructions of a malicious application and a plurality of known instructions of a white sample application. In an embodiment, the obtaining module 810 may be configured to perform the operation S310 described above, which is not described herein again.

The statistical module 820 is configured to obtain, by using statistics, a first call frequency at which a plurality of unknown instructions of the malicious application are called in the running process and a second call frequency at which a plurality of known instructions of the white sample application are called in the running process. In an embodiment, the statistic module 820 may be configured to perform the operation S320 described above, which is not described herein again.

The determining module 830 is configured to determine at least one target instruction of the plurality of unknown instructions that matches the plurality of known instructions according to the first call frequency and the second call frequency. In an embodiment, the determining module 830 may be configured to perform the operation S330 described above, and is not described herein again.

The parsing module 840 is configured to parse at least one target instruction to obtain operation-related data of the malicious application. In an embodiment, the parsing module 840 may be configured to perform the operation S340 described above, which is not described herein again.

According to an embodiment of the present disclosure, the obtaining module 810 includes a first analyzing unit, configured to analyze the reinforcement policy to obtain a plurality of unknown instructions in a case that the malicious application includes the reinforcement policy.

According to an embodiment of the disclosure, the obtaining module 810 further includes: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring an installation package of a malicious application program; a shelling unit, configured to, in a case where it is determined that the installation package includes the code extraction shell, perform shelling processing on the installation package; and the second analysis unit is used for analyzing the installation package after shelling to obtain a plurality of unknown instructions.

According to an embodiment of the present disclosure, the statistics module 820 includes: the second acquisition unit is used for acquiring respective instruction addresses of the unknown instructions to obtain a plurality of instruction addresses; a setting unit configured to set interrupt points at positions corresponding to the plurality of instruction addresses in the malicious application program, respectively; the obtaining unit is used for obtaining the calling data of a plurality of unknown instructions through the interruption points; and the statistical unit is used for obtaining a first calling frequency of the plurality of unknown instructions by statistics according to the calling data.

According to an embodiment of the present disclosure, the second acquisition unit includes: a third acquisition unit configured to acquire an interpreter of a malicious application; the analysis unit is used for analyzing the executable file of the malicious application program through the interpreter to obtain code data corresponding to a plurality of unknown instructions in the executable file; and a third analysis unit for analyzing the code data to obtain a plurality of instruction addresses.

The third acquisition unit includes: the calling unit is used for calling a plurality of unknown instructions to obtain calling addresses of the unknown instructions; and a first determination unit for determining the interpreter according to the call address.

According to an embodiment of the present disclosure, the determining module 830 includes: the second determining unit is used for determining a first sequence in which the plurality of unknown instructions are called according to the first calling frequency and determining a second sequence in which the plurality of known instructions are called according to the second calling frequency; and a third determining unit, configured to determine, according to the first ordering and the second ordering, at least one target instruction that matches the plurality of known instructions among the unknown instructions; the white sample application and the malicious application are the same type of application.

According to an embodiment of the present disclosure, any multiple modules of the obtaining module 810, the statistics module 820, the determining module 830, and the analyzing module 840 may be combined into one module to be implemented, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the disclosure, at least one of the obtaining module 810, the statistical module 820, the determining module 830 and the analyzing module 840 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented by any one of three implementations of software, hardware and firmware, or any suitable combination of any of the three. Alternatively, at least one of the obtaining module 810, the statistics module 820, the determining module 830 and the analyzing module 840 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the item recommendation method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of resolving a malicious application, comprising:

obtaining a plurality of unknown instructions of a malicious application program and a plurality of known instructions of a white sample application program;

obtaining a first calling frequency of the unknown instructions called by the malicious application program in the running process and a second calling frequency of the known instructions called by the white sample application program in the running process by utilizing statistics;

determining at least one target instruction matched with the plurality of known instructions in the plurality of unknown instructions according to the first calling frequency and the second calling frequency; and

and analyzing the at least one target instruction to obtain operation related data of the malicious application program.

2. The method of claim 1, wherein said determining at least one target instruction of the plurality of unknown instructions that matches the plurality of known instructions based on the first call frequency and the second call frequency comprises:

determining a first ordering in which the plurality of unknown instructions are called according to the first calling frequency, and determining a second ordering in which the plurality of known instructions are called according to the second calling frequency; and

determining at least one target instruction in the unknown instructions that matches the plurality of known instructions according to the first ordering and the second ordering;

wherein the white sample application and the malicious application are the same type of application.

3. The method of claim 1, wherein the obtaining the utilization statistics for a first calling frequency at which the unknown instructions are called during execution of the malicious application comprises:

acquiring respective instruction addresses of the unknown instructions to obtain a plurality of instruction addresses;

respectively setting interrupt points at positions corresponding to the instruction addresses in the malicious application program;

obtaining the calling data of the unknown instructions through the interruption point; and

and counting to obtain a first calling frequency of the plurality of unknown instructions to be called according to the calling data.

4. The method of claim 3, wherein the fetching of the respective instruction addresses of the plurality of unknown instructions resulting in a plurality of instruction addresses comprises:

obtaining an interpreter of the malicious application program;

analyzing the executable file of the malicious application program through the interpreter to obtain code data corresponding to the unknown instructions in the executable file; and

and analyzing the code data to obtain the plurality of instruction addresses.

5. The method of claim 4, wherein the obtaining the interpreter of the malicious application comprises:

calling the unknown instructions to obtain calling addresses of the unknown instructions; and

and determining the interpreter according to the calling address.

6. The method of claim 1, wherein the obtaining a plurality of unknown instructions for a malicious application comprises:

and under the condition that the malicious application program comprises a reinforcement strategy, analyzing the reinforcement strategy to obtain the unknown instructions.

7. The method of claim 6, wherein the obtaining a plurality of unknown instructions for a malicious application further comprises:

acquiring an installation package of the malicious application program;

in the case that the installation package is determined to comprise a code extraction shell, performing shell removal processing on the installation package; and

and analyzing the unshelled installation package to obtain the plurality of unknown instructions.

8. An apparatus for resolving malicious applications, comprising:

the acquisition module is used for acquiring a plurality of unknown instructions of the malicious application program and a plurality of known instructions of the white sample application program;

the statistical module is used for obtaining a first calling frequency of the unknown instructions called by the malicious application program in the running process and a second calling frequency of the known instructions called by the white sample application program in the running process by utilizing statistics;

the determining module is used for determining at least one target instruction which is matched with the plurality of known instructions in the plurality of unknown instructions according to the first calling frequency and the second calling frequency; and

and the analysis module is used for analyzing the at least one target instruction to obtain the running related data of the malicious application program.

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.

11. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7.