CN106951780B

CN106951780B - Beat again the static detection method and device of packet malicious application

Info

Publication number: CN106951780B
Application number: CN201710069633.1A
Authority: CN
Inventors: 刘超; 喻民; 谭民; 朱大立; 姜建国
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2017-02-08
Filing date: 2017-02-08
Publication date: 2019-09-10
Anticipated expiration: 2037-02-08
Also published as: CN106951780A

Abstract

The present invention relates to a kind of static detection methods and device for beating again packet malicious application, this method comprises: obtaining the incidence relation between class belonging to the API Calls sequence and each API Calls sequence of the installation kit of application program to be detected；Construct class-based function call relationship graph；Clustering is carried out to each class, obtains multiple clusters, and by the cluster of the strongest preset quantity of incidence relation removes between class and class in each cluster, obtains malicious code cluster；Extract sensitive API calling sequence in the API Calls sequence of each class in malicious code cluster, and the characteristic sequence sample by the sensitive API calling sequence of each class extracted respectively with malicious application in the sample database that pre-establishes carries out similarity mode；Determine whether application program to be detected attaches most importance to the malicious application of packing.The present invention extracts malicious code independent of Android official application program as unit of class, so being directed to the malicious code of mutation, also can guarantee higher accuracy.

Description

Static detection method and device for repackaging malicious applications

Technical Field

The invention relates to the technical field of malicious code detection, in particular to a static detection method and a static detection device for repackaging malicious applications.

Background

With the rapid development of the mobile internet, the sales volume of the smart terminal (e.g., a smart phone, a tablet, etc.) is rapidly increasing due to its convenience in carrying, excellent performance, and rich functions (e.g., instant messaging, handling office, network game, etc.). At present, China mobile Internet users exceed 8 hundred million, Google Play breaks through 140 thousands of applications in 2015, and application markets of various third parties in China also have a large number of mobile applications. The applications bring great convenience to people and also bring great information safety hidden dangers and risks. A research of malicious application program analysis based on an Android system shows that: after analyzing 1260 malicious application samples, 1083 (86%) malicious applications were found to have been generated by repackaging the legitimate versions with the malicious applications.

In the face of the problem of the inundation of malicious repackaging application programs on the Android platform, researchers at home and abroad propose different detection methods. In which droidmos is a typical representative, the method first assumes that the Android application programs in the Android official application market are the most initial, unpacked and non-malicious, so as to detect whether the Android application programs from other sources, such as a third party application market, are unpacked malicious application programs. The detection process adopts a fuzzy hash algorithm, generates a unique signature of the Android application program based on the instruction sequence, and then performs pairwise comparison to realize whether the application program is malicious or not.

In the detection method, the Android official application market is assumed to be native, non-malicious and not repackaged, and the assumption is too optimistic in some aspects to detect the repackaged application in the Android application market. Moreover, the detection capability for variant malicious code is quite limited, requiring timely updates to the malicious sample library. Both of the above two points make detection accuracy of DroidMOSS low.

Disclosure of Invention

Aiming at the defects, the invention provides a static detection method and a static detection device for packaging malicious applications, which can improve the detection accuracy.

In a first aspect, the static detection method for repackaging malicious applications provided by the present invention includes:

acquiring API calling sequences of an installation package of an application program to be detected and an association relation between classes to which each API calling sequence belongs;

constructing a function call relation graph based on classes according to the strength of the association relation between the classes; the nodes in the function call relation graph are classes;

according to the strength degree of the association relationship between the classes, clustering and dividing each class to obtain a plurality of clusters, and removing a preset number of clusters with the strongest association relationship between the classes in each cluster to obtain a malicious code cluster;

extracting sensitive API calling sequences from the API calling sequences of all classes in the malicious code cluster, and respectively carrying out similarity matching on the extracted sensitive API calling sequences of all classes and a characteristic sequence sample of a malicious application program in a pre-established sample library;

and determining whether the application program to be detected is a repackaged malicious application program or not according to the similarity matching result.

Optionally, the obtaining of the API call sequence of the installation package of the application to be detected includes: preprocessing the installation package to obtain classes.dex files; performing decompiling on the classes and dex file to obtain a smali file; and extracting an API calling sequence from the smali file.

Optionally, the preprocessing the installation package to obtain classes. Decompressing the installation package, and extracting classes.

Optionally, the extracting an API call sequence from the smali file includes: and searching and backtracking from the corresponding position of each entry point of the application program to be detected in the smali file to extract the API calling sequence.

Optionally, the performing similarity matching between the extracted sensitive API call sequence of each class and a pre-established sample of a feature sequence of a malicious application program in a sample library includes: carrying out similarity matching on the extracted sensitive API calling sequence of each class and family characteristics of the malicious application program families of the same class in the sample library; wherein the malicious application family comprises a plurality of malicious applications of the same category; the family features are a sequence of features of the malicious application family and include a sequence sample of sensitive API calls for each malicious application in the malicious code family.

Optionally, the method further includes: and if the application program to be detected is determined to be the repackaged malicious application program according to the similarity matching result, adding the sensitive API calling sequence of the application program to be detected into the family characteristics of the same category malicious application program family.

In a second aspect, the present invention provides a static detection apparatus for repackaging malicious applications, including:

the acquisition module is used for acquiring API calling sequences of the installation package of the application program to be detected and the association relation between the classes to which the API calling sequences belong;

the building module is used for building a function call relation graph based on classes according to the strength of the incidence relation between the classes; the nodes in the function call relation graph are classes;

the cluster module is used for clustering and dividing each class according to the strength degree of the association relationship between the classes to obtain a plurality of clusters, and removing the clusters with the strongest preset number of association relationships between the classes in each cluster to obtain a malicious code cluster;

the matching module is used for extracting sensitive API calling sequences from the API calling sequences of all classes in the malicious code cluster and respectively carrying out similarity matching on the extracted sensitive API calling sequences of all classes and a characteristic sequence sample of a malicious application program in a pre-established sample library;

and the determining module is used for determining whether the application program to be detected is a repackaged malicious application program or not according to the similarity matching result.

Optionally, the obtaining module includes:

the preprocessing unit is used for preprocessing the installation package to obtain classes. The decompiling unit is used for decompiling the classes and dex file to obtain a smali file; and the extraction unit is used for extracting the API calling sequence from the smali file.

Optionally, the matching module is specifically configured to: carrying out similarity matching on the extracted sensitive API calling sequence of each class and family characteristics of the malicious application program families of the same class in the sample library; wherein the malicious application family comprises a plurality of malicious applications of the same category; the family features are a sequence of features of the malicious application family and include a sequence sample of sensitive API calls for each malicious application in the malicious code family.

Optionally, the apparatus further comprises:

and the updating module is used for adding the sensitive API calling sequence of the application program to be detected to the family characteristics of the same category of malicious application program families when the application program to be detected is determined to be the repackaged malicious application program according to the similarity matching result.

According to the static detection method and device for the repackaging malicious application, the application program in the Android official application market is not assumed to be original, non-malicious and not repackaged in each step, namely the method and device do not depend on the Android official application program, the detection accuracy can be further improved, and the detection of the Android official application program can be realized. In addition, in S3, each class is clustered and divided according to the strength of the calling relationship of each class to extract the malicious code portion, and since this process is performed in class units, even if the developer of the malicious program modifies the injected portion, the detection result can be determined according to the similarity, so that a relatively accurate detection result can be obtained, and therefore, a relatively high accuracy can be ensured for the malicious code of the variation. In addition, in the detection process, the malicious code part is proposed in a clustering division mode, the detection result is not influenced by the operations of modification, deletion and the like of the normal code part of the application program, and the detection accuracy is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart illustrating a static detection method for repackaging malicious applications according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the API call sequence obtained in S1 according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the structure of an application program according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the function call relationship diagram constructed in S2 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the clusters obtained after the cluster analysis in S4 according to an embodiment of the present invention;

fig. 6 shows a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

In a first aspect, the invention provides a static detection method for repackaging malicious applications, which can be used for detecting repackaged malicious applications and is suitable for Android application programs in an Android official application market, a third-party application market or other sources. As shown in fig. 1, the method includes:

s1, obtaining API calling sequences of the installation package of the application program to be detected and the association relation between the classes to which the API calling sequences belong;

it is understood that the API, Application Programming Interface, refers to a calling Interface that the operating system leaves for an Application program, which makes the operating system execute commands or actions of the Application program by calling the API of the operating system. The API call sequence acquired in this step may refer to fig. 2.

It is understood that the so-called class-to-class association may include: inheritance relationships, references, function call relationships, and the like.

S2, constructing a function call relation graph based on classes according to the strength of the incidence relation between the classes; the nodes in the function call relation graph are classes;

it can be understood that the application program is mainly organized in the form of classes and packages in the development process, as shown in fig. 3, under the APK (Android Package), there are n classes, class, and under each class there are multiple calling functions, and under each calling function there are multiple API calling sequences. Therefore, in the development process, a developer puts codes with certain association into a class and then organizes the codes into a package, and the class and the package have specific semantic information. Accordingly, the API call sequences are classified, that is, aggregated based on the classes, so as to construct a function call relation diagram, which may refer to fig. 4.

S3, according to the strength degree of the association relationship between the classes, clustering and dividing each class to obtain a plurality of clusters, and removing the clusters with the strongest preset number of association relationships between the classes in each cluster to obtain a malicious code cluster;

it can be understood that the clustering division is performed according to the strength degree of the class and the class calling relationship, that is, the classes with stronger calling relationship are divided together, and the classes with weaker calling relationship are divided together, and the specific strength degree can be set according to actual needs, and if the calling relationship is stronger than a certain specific degree, the calling relationship is considered to be stronger, otherwise, the calling relationship is considered to be weaker. According to the research on the Android platform malicious application hiding technology, after a developer of the malicious application embeds the malicious application into a normal application by using a repacking technology, the repacked malicious application contains all components and instructions of the malicious application. In the repackaging process, in order to ensure the normal execution of the application program functions, most of the repackaged malicious application programs adopt an injection of an independent component to execute malicious behaviors, for example, an injection of an independent broadcast listener to monitor the startup time of the mobile phone. After the handset is restarted, the broadcaster is triggered to perform malicious activities. Due to the independence of the components that perform malicious activities, the relationship between the malicious code portions and the code portions of normal applications is weak in function call relationships. Therefore, the malicious code part can be extracted from the APK file of the application program by clustering according to the strength of each class calling relation. In this step, each cluster obtained after the cluster analysis can refer to fig. 5.

It should be noted that the meaning of the repackaged malicious application and the malicious application mentioned in the present invention is different, the malicious application only refers to all components and instructions for executing malicious behavior, and the repackaged malicious application includes the malicious application and normal code part, and is formed by repackaging after injecting the malicious application into the normal application.

S4, extracting sensitive API calling sequences from the API calling sequences of all classes in the malicious code cluster, and respectively carrying out similarity matching on the extracted sensitive API calling sequences of all classes and a characteristic sequence sample of a malicious application program in a pre-established sample library;

it is understood that the malicious code clusters are composed of different classes together, and the signature of the malicious code clusters is composed of sensitive API sequence signatures of each class together.

It is understood that the sensitive API call sequence refers to a call sequence of a sensitive API, as shown in table 1, the sensitive API has a short message type, a device information type, a geographic information type, a broadcast type, a database type, a network type, a voice recording type, and other types. In order to realize the malicious behavior of the repackaged malicious application program, the sensitive API is necessarily called, so that the sensitive API calling sequence is used as the characteristic sequence of the application program to be detected. And extracting the sensitive API calling sequence, and performing similarity matching on the sensitive API calling sequence and the characteristic sequence sample of the malicious application program in the sample library, so that whether the application program to be detected is a repackaged malicious application program or not is conveniently determined.

TABLE 1 description of the belongings of sensitive AP1

And S5, determining whether the application program to be detected is a repackaged malicious application program or not according to the similarity matching result.

It can be understood that if the similarity matching degree is higher, the application program to be detected can be determined as a repackaged malicious application program. If there are multiple classes of sensitive API call sequences and the similarity matching degree between one of the classes of sensitive API call sequences and the sample library is high, the application to be detected is also considered to be a repackaged malicious application.

According to the detection method provided by the invention, the application programs in the Android official application market are not assumed to be original ecological, non-malicious and not repackaged in each step, namely, the detection method does not depend on the Android official application programs, so that the detection accuracy can be further improved, and the detection of the Android official application programs can be realized. In addition, in S3, each class is clustered and divided according to the strength of the calling relationship of each class to extract the malicious code portion, and since this process is performed in class units, even if the developer of the malicious program modifies the injected portion, the detection result can be determined according to the similarity, so that a relatively accurate detection result can be obtained, and therefore, a relatively high accuracy can be ensured for the malicious code of the variation. In addition, in the detection process, the malicious code part is proposed in a clustering division mode, the detection result is not influenced by the operations of modification, deletion and the like of the normal code part of the application program, and the detection accuracy is further improved.

When implemented, the specific process of S1 may include:

s11, preprocessing the installation package to obtain classes.

It will be appreciated that the basic structure of the installation package includes:

META-INF \ Jar, commonly seen in this document;

res', which is a directory for storing resource files;

xml, which is a program global configuration file;

dex, is Dalvik bytecode;

arsc, which is a compiled binary resource file.

As can be seen from the above structure, the classes. Because the nature of the application installation package is a compressed file in a zip format, class.

S12, performing decompiling on the classes.

It can be understood that the decompilation is actually a reverse analysis technology, and the smali file obtained by decompilation of the classes.

And S13, extracting an API calling sequence from the smali file.

In a specific implementation, the API call sequence may be extracted from the smali file in the following manner: and searching and backtracking from the corresponding position of each entry point of the application program to be detected in the smali file to extract the API calling sequence. And the return value and the parameter of the calling function can be extracted as additional information, so that a function calling relation graph with richer information is constructed.

In the above, a manner of obtaining the API call sequence of the installation package is provided, and of course, other manners may be adopted to obtain the API call sequence, which is not limited in the present invention.

In a specific implementation, the similarity matching between the extracted sensitive API call sequences of each class and the feature sequence samples of the malicious application programs in the pre-established sample library in S4 may include:

carrying out similarity matching on the extracted sensitive API calling sequence of each class and family characteristics of the malicious application program families of the same class in the sample library;

the malicious application family comprises a plurality of malicious applications of the same category; the family features are a sequence of features of the malicious application family and include a sequence sample of sensitive API calls for each malicious application in the malicious code family.

The family classification is carried out on each malicious application program according to the category of malicious behaviors executed by a plurality of known malicious application programs in advance, after the sensitive API calling sequence is extracted at this time, the sensitive API calling sequence is subjected to similarity matching with the family characteristics of the malicious application program families in the same category, the familial detection is realized, comparison with the characteristic sequences of all the malicious application programs is not needed, and the detection efficiency is improved.

In specific implementation, if the application program to be detected is determined to be a repackaged malicious application program according to the similarity matching result, the sensitive API call sequence of the application program to be detected can be added to the family features of the malicious application program families of the same category to update the sample library, so that the sample library can meet the detection requirement.

Actually, 1009 open Android malicious repackaging application programs are tested, the detection accuracy is up to 93%, and therefore the detection method provided by the invention has good performance in accuracy and usability and can be applied to actual detection work.

In a second aspect, the present invention further provides an apparatus for statically detecting a repackaged malicious application, including:

Optionally, the obtaining module includes:

the preprocessing unit is used for preprocessing the installation package to obtain classes.

The decompiling unit is used for decompiling the classes and dex file to obtain a smali file;

and the extraction unit is used for extracting the API calling sequence from the smali file.

Optionally, the apparatus further comprises:

It can be understood that the static detection apparatus provided by the present invention is a functional architecture module of the static detection method, and the explanation, the optional implementation, the beneficial effects, and the like of the related contents can show the corresponding contents in the static detection method, and are not described herein again.

The present invention also provides an electronic device, and referring to fig. 6, the electronic device includes: a processor (processor)601, a memory (memory)602, a communication Interface (Communications Interface)603, and a bus 604; wherein,

the processor 601, the memory 602 and the communication interface 603 complete mutual communication through the bus 604;

the communication interface 603 is used for information transmission between the electronic device and a corresponding communication device;

the processor 601 is configured to call program instructions in the memory 602 to perform the methods provided by the above-mentioned method embodiments, for example, including: acquiring API calling sequences of an installation package of an application program to be detected and an association relation between classes to which each API calling sequence belongs; constructing a function call relation graph based on classes according to the strength of the association relation between the classes; the nodes in the function call relation graph are classes; according to the strength degree of the association relationship between the classes, clustering and dividing each class to obtain a plurality of clusters, and removing a preset number of clusters with the strongest association relationship between the classes in each cluster to obtain a malicious code cluster; extracting sensitive API calling sequences from the API calling sequences of all classes in the malicious code cluster, and respectively carrying out similarity matching on the extracted sensitive API calling sequences of all classes and a characteristic sequence sample of a malicious application program in a pre-established sample library; and determining whether the application program to be detected is a repackaged malicious application program or not according to the similarity matching result.

The present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-described method embodiments, for example comprising: acquiring API calling sequences of an installation package of an application program to be detected and an association relation between classes to which each API calling sequence belongs; constructing a function call relation graph based on classes according to the strength of the association relation between the classes; the nodes in the function call relation graph are classes; according to the strength degree of the association relationship between the classes, clustering and dividing each class to obtain a plurality of clusters, and removing a preset number of clusters with the strongest association relationship between the classes in each cluster to obtain a malicious code cluster; extracting sensitive API calling sequences from the API calling sequences of all classes in the malicious code cluster, and respectively carrying out similarity matching on the extracted sensitive API calling sequences of all classes and a characteristic sequence sample of a malicious application program in a pre-established sample library; and determining whether the application program to be detected is a repackaged malicious application program or not according to the similarity matching result.

The present invention provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform a method provided by the above method embodiments, for example, comprising: acquiring API calling sequences of an installation package of an application program to be detected and an association relation between classes to which each API calling sequence belongs; constructing a function call relation graph based on classes according to the strength of the association relation between the classes; the nodes in the function call relation graph are classes; according to the strength degree of the association relationship between the classes, clustering and dividing each class to obtain a plurality of clusters, and removing a preset number of clusters with the strongest association relationship between the classes in each cluster to obtain a malicious code cluster; extracting sensitive API calling sequences from the API calling sequences of all classes in the malicious code cluster, and respectively carrying out similarity matching on the extracted sensitive API calling sequences of all classes and a characteristic sequence sample of a malicious application program in a pre-established sample library; and determining whether the application program to be detected is a repackaged malicious application program or not according to the similarity matching result.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the test equipment and the like of the display device are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A static detection method for repackaging malicious applications is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining of the API call sequence of the installation package of the application to be detected comprises:

preprocessing the installation package to obtain classes.dex files;

performing decompiling on the classes and dex file to obtain a smali file;

and extracting an API calling sequence from the smali file.

3. The method of claim 2, wherein said pre-processing said installation package to obtain classes.

Decompressing the installation package, and extracting classes.

4. The method of claim 2, wherein extracting the API call sequence from the smali file comprises:

and searching and backtracking from the corresponding position of each entry point of the application program to be detected in the smali file to extract the API calling sequence.

5. The method according to claim 1, wherein the similarity matching of the extracted sensitive API call sequences of each class with the feature sequence samples of malicious applications in the pre-established sample library respectively comprises:

the malicious application family comprises a plurality of malicious applications of the same category; the family features are a sequence of features of the malicious application family and include a sample of a sequence of sensitive API calls for each malicious application in the malicious application family.

6. The method of claim 5, further comprising:

and if the application program to be detected is determined to be the repackaged malicious application program according to the similarity matching result, adding the sensitive API calling sequence of the application program to be detected into the family characteristics of the same category malicious application program family.

7. A static detection apparatus for repackaging malicious applications, comprising:

8. The apparatus of claim 7, wherein the obtaining module comprises:

9. The apparatus of claim 7, wherein the matching module is specifically configured to: carrying out similarity matching on the extracted sensitive API calling sequence of each class and family characteristics of the malicious application program families of the same class in the sample library; wherein the malicious application family comprises a plurality of malicious applications of the same category; the family features are a sequence of features of the malicious application family and include a sample of a sequence of sensitive API calls for each malicious application in the malicious application family.

10. The apparatus of claim 9, further comprising: