CN109753800B - Android malicious application detection method and system fusing frequent item set and random forest algorithm - Google Patents

Android malicious application detection method and system fusing frequent item set and random forest algorithm Download PDF

Info

Publication number
CN109753800B
CN109753800B CN201910002795.2A CN201910002795A CN109753800B CN 109753800 B CN109753800 B CN 109753800B CN 201910002795 A CN201910002795 A CN 201910002795A CN 109753800 B CN109753800 B CN 109753800B
Authority
CN
China
Prior art keywords
frequent
sample
malicious
feature
authority
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910002795.2A
Other languages
Chinese (zh)
Other versions
CN109753800A (en
Inventor
景小荣
王丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201910002795.2A priority Critical patent/CN109753800B/en
Publication of CN109753800A publication Critical patent/CN109753800A/en
Application granted granted Critical
Publication of CN109753800B publication Critical patent/CN109753800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an Android (Android) malicious detection method fusing a frequent item set (Apriori) algorithm and a random forest algorithm, and relates to the technical field of information processing. Performing decompiling on the Android application sample, and obtaining an association relation among sample centralized authorities according to the authority extracted from each decompiled file and the static feature of function call; excavating frequent 3-term sets of a malicious sample and a normal sample based on an Apriori algorithm, and further calling and generating characteristics by combining a sensitive Application Programming Interface (API) function; and learning and classifying the features by adopting a random forest classifier, so that the Android malicious application detection is realized. The method is used for malicious detection of the Android application software, the system resource consumption is low, and the detection accuracy is very high.

Description

Android malicious application detection method and system fusing frequent item set and random forest algorithm
Technical Field
The invention relates to the field of network security and information security detection, in particular to an Android malicious application detection method.
Background
Android (Android) is widely applied in the world as the most popular intelligent terminal system in the world at present by the characteristics of platform openness, free charge and the like. Thus, many malicious code researchers target attacks on the Android platform. With the technical progress, the manufacturing cost of the Android malicious program is lower and lower, so that the number of the Android malicious software is increased day by day. According to data display issued by a 360 Internet security center, 757.3 thousands of newly-added malicious software samples of the Android platform are intercepted in 2017, and 3.1 thousands of newly-added malicious software samples are added every day on average. The malicious software frequently initiates attacks by using various new technologies such as mining trojans, botnets and the like, and includes rogue behaviors such as stealing personal information of users and malicious charges, so that huge losses are brought to the users. In the face of the generalized malicious attacks, how to effectively realize the detection of the malicious Android applications becomes the primary problem of the safety of the current Android platform.
At present, malicious detection of Android application is mainly divided into static detection and dynamic detection. The static detection is to analyze a source program by adopting reverse engineering means such as decompilation and the like without running application software, extract characteristics such as signature, authority and the like, and directly analyze characteristic behaviors. The static detection technology mainly extracts the characteristics of an application program description file (android. Xml) and a grammar file (smali) code file. Xml and the information tag in the smali code file are analyzed by Guo et al, and the class, the authority, the component, the signature, various processed data, the starting information and the like of the application are extracted. Rashidi B et al use an authority and Application Programming Interface (API) function call as a feature set, and use a Support Vector Machine (SVM) and a K-Nearest Neighbor (KNN) algorithm to detect malicious applications, but have many false positives. The detection of the Android application software by machine learning can be manually removed, the analysis efficiency is improved, and the machine learning depends on the extracted application characteristics.
The dynamic detection of the Android malicious application refers to that in the running process of application software, the characteristics of the application are obtained through technologies such as injection, HOOK (HOOK) and the like, but the defects are that the software is required to run, and system resources are excessively consumed. In the aspect of dynamic detection research, mahinderu et al use a tracker (Strace) to collect behavior data of application software and transmit the behavior data to an analysis server, train the behavior samples by using a classifier, and finally use a K-nearest neighbor algorithm to judge whether the application contains malicious behaviors. Singh L et al, using API Hook technology, performs Hook on sensitive APIs on an Android platform, and once a system or application calls a particular API, may intercept the call function and redirect it to a proxy function to obtain detailed information, i.e., obtain behavior information.
Disclosure of Invention
The technical problem to be solved by the invention is to solve the above defects in the prior art, learn and detect the Android application by calling a machine learning algorithm, reduce the detection complexity of the malicious Android application, save the system resource consumption, and further improve the detection accuracy of the malicious software on the aspects of solving the problems of high-dimensional features and automatic classification detection.
The technical scheme for solving the technical problems is to provide an android malicious application detection method fusing a frequent item set (Apriori) algorithm and a random forest algorithm, and the method comprises the following steps of: performing batch decompiling on the Android application software to obtain application software permission and sensitive API function static characteristics; mining a frequent item set of the authority characteristics to perform dimension reduction processing on the authority characteristics to obtain a frequent 3-item set of the authority so as to obtain an association relation between the authorities in the sample set; excavating frequent 3-item sets of the malicious samples and the normal samples, taking the frequent 3-item sets and the sensitive API functions as feature construction feature sets, screening and grading feature attributes in the feature sets by adopting an information gain algorithm, extracting important features, and constructing vector spaces corresponding to the important features; and learning and classifying and detecting the vector space by adopting a random forest algorithm, and marking the normal or malicious attributes of the vector space of the normal sample and the malicious sample.
The method further comprises the step of decompiling the application software by using a static analysis tool before feature extraction to obtain files containing resource files (res), so files (lib) of a third-party software development kit, smali and android manifest.
The method further comprises the steps of extracting features by adopting a programming language (python) script, analyzing all rights of an application extracted from extended markup language files such as android Manifest.
The invention further comprises that the frequent 3-item set for mining the authority characteristics specifically comprises: respectively extracting authorities from the malicious sample and the normal sample to construct an authority set; 1-item set of mining frequent authority set: calculating the support degree S of each authority in the authority set, and pruning the frequent 1-item set which does not meet the minimum support degree min _ S to obtain a candidate set L 1 Then to L 1 The elements in (1) are connected; taking the connected candidate set as a new sample set, and mining a frequent 2-item set: for the deficiency ofPruning a frequent 2-item set with the minimum support degree min _ s to form a new candidate set L 2 This is repeated until a frequent 3-item set is obtained.
The invention further comprises that the Information Gain (IG) algorithm is adopted, specifically, the difference value between the entropy value of the characteristic and the conditional entropy of the characteristic is calculated to obtain the IG value of the characteristic, the larger the IG value is, the larger the correlation degree is, the important characteristic is reserved according to the correlation degree, the important characteristic is matched with each application software in the system, and the corresponding vector space is respectively constructed. The construction of the vector space comprises, in particular, the construction of a feature vector (x) comprising different application software 1 ,x 2 ,…,x n ) The feature set X calls a formula v: s → {0,1} |X| And constructing a vector space v according to the feature vectors in the set X, wherein s represents certain application software, each dimension in v corresponds to a certain feature in X, if s contains the certain feature, the identification value corresponding to the feature in the vector space v is 1, and if not, the identification value is 0.
The invention also provides an Android malicious application detection system integrating an Apriori algorithm and a random forest algorithm, which comprises the following steps: the system comprises a feature extraction module, a feature processing module and a random forest classification algorithm module, wherein the feature extraction module is used for extracting features of batch decompiled Android application software to obtain application software permission and sensitive API function static features; the characteristic processing module excavates a frequent item set of authority characteristics, performs dimension reduction processing on the authority characteristics to obtain a frequent 3-item set of the authority so as to obtain an incidence relation between the authorities in the sample set, excavates a frequent 3-item set of a malicious sample and a frequent 3-item set of a normal sample, takes the frequent 3-item set and a sensitive API (application program interface) function as a characteristic construction characteristic set, screens and scores characteristic attributes in the characteristic set by adopting an information gain algorithm, extracts important characteristics and constructs a vector space corresponding to the important characteristics; and the random forest classification algorithm module is used for learning and classifying the vector space, and performing normal or malicious attribute marking on the vector spaces of the normal sample and the malicious sample.
The method adopts a static detection mode to extract the application data characteristics, further adopts an Apriori algorithm to dig out frequent 3-item sets of authority limits in normal and malicious software for the data characteristics, then fuses sensitive API (application program interface) calling functions, and adopts a random forest creation classifier to learn and classify the data characteristics. Further, an IG algorithm is adopted to obtain the IG value of the characteristic by calculating the difference value between the entropy value of the characteristic and the conditional entropy value of the characteristic, important characteristics are reserved, and a matching algorithm is utilized to respectively construct a vector space corresponding to each application software in the system. The invention excavates the frequent 3-item set of the high-dimensional authority characteristics, and has less system resource consumption.
Drawings
FIG. 1 shows an Android malicious application detection model fusing an Apriori algorithm and a random forest algorithm.
Detailed Description
The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.
FIG. 1 is a schematic diagram of a model of an applied inspection system according to the present invention. In order to realize detection of Android system malicious application, the Android malicious application detection system is provided by fusing an Apriori algorithm mining frequent 3-item set and random forest algorithm classification.
Firstly, collected normal software and malicious software samples are intensively decompiled in batches, authority (Android permission) applied by an application program and a sensitive Application Programming Interface (API) function call are extracted from a decompiled application program description file Android manifest.
The following description will specifically explain each part.
(1) The feature extraction module is used for compiling the sample set in batch by using a programming language Python script and extracting features of the decompiled android. For the authority feature extraction, corresponding authority features are extracted from a certain application authority, for example, all the authorities applied are extracted from an android manifest. When a user uses a certain function in the system or accesses some sensitive data, the user applies for a use permission, for example, the permission applied in an android management. For sensitive API functions, each smali file represents a programming language (Java) class containing various system application interface functions called by the application, all smali files are traversed using the method in python, os. By traversing the sensitive API functions of all samples extracted from the smali file, the potential malicious behavior of the application software can be obtained accordingly. Because the smali is used as a byte code file of an android virtual machine (Dalvik), each smali file represents a Java class and comprises various system application interface functions called by the application; malicious behavior due to malware must call the corresponding API function. Therefore, all called sensitive API functions in the sample set are used as learning features of the random forest algorithm and trained to detect malicious applications.
The sample set must be decompiled before feature extraction can be performed on each sample software. Files with apk suffixes can be decompiled using a decompilation tool, apktool, to yield files containing resource files (res), third party sdk so files (lib), smali, and android manifest.
Generally, before malicious behaviors are generated, malicious software applies some dangerous right combinations in terms of rights, and the dangerous right combinations depend on each other to generate the malicious behaviors. Therefore, the malware not only applies for a single dangerous permission, but also applies for a dangerous permission combination, for example, in a malicious sample, the application permission combination is usually READ _ SMS (READ short message service), READ _ PHONE _ STATE (READ mobile PHONE STATE), WRITE _ SMS (edit short message service), and can perform malicious operations of reading user privacy and sending to other places, and the like, while the normal software is rarely provided with the permission combination, and the malicious software can be judged to be the malware according to the potential malicious behavior of the mutual cooperation of different dangerous permissions in the permission combination.
The Apriori algorithm is an algorithm proposed by Agrawal et al to mine a frequent set of boolean association rules. The Apriori algorithm mines a frequent 3-term set of permissions. A large number of rights features and sensitive API functions are obtained. However, the obtained authority feature dimension is very large and the computation complexity is high, so that the frequent item set based on the Apriori algorithm mining authority feature is adopted to perform dimension reduction processing on the authority feature dimension to obtain the frequent 3-item set of the authority. And mining a frequent 3-item set of the authority characteristics to obtain the association relation between the authorities in the sample set, wherein the specific steps are described as follows.
Mining frequent 3-item sets of permissions based on Apriori algorithm, specifically, extracting a normal software sample permission set P and a malicious sample permission set M of the application from all samples, wherein P = { P = 1 ,p 2 ,…,p n Represents the authority set of normal software samples, which represents n authorities applied by all normal software samples, and M = { M = } 1 ,m 2 ,…,m x And the permission set of the representative malicious sample represents x permissions applied in all the malicious samples. A frequent 3-item set is mined separately for the sets of permissions for the normal and malicious samples. Specifically, the following method can be adopted:
mining the authority set of the sample authority for a frequent 1-item set: calculating the support degree S of each authority in the sample authority set, representing the probability of the authority appearing in all the sample sets, pruning the frequent 1-item set which does not meet the minimum support degree min _ S to obtain a set meeting the conditions, and taking the set as a candidate set L 1 Then to L 1 The elements in (1) are connected; then, the connected candidate set is used as a new sample set, all 2-item sets are contained at the moment, then, frequent 2-item sets are mined from the new sample set, and the frequent 2-item sets which do not meet the minimum support degree min _ s are pruned to form a new candidate setSelection set L 2 And repeating the steps until a frequent 3-item set of the sample authority set is obtained.
Connecting: in a certain frequent n-item set, starting from the first item (for example, the ith item) of the set, searching the item (for example, the jth item) with which the first n-1 item is the same downwards, and then connecting all the elements in the i item and the nth element of the jth item into an n +1 item set.
Respectively calculating P from a normal software sample authority set P 1 ,p 2 ,…,p n The occurrence frequency is used as the support S of the element, the minimum support is the lowest occurrence frequency of each element in P and is between 0 and 1, after the frequent 1-item set is mined, pruning and connection are carried out according to the minimum support, and finally the frequent 3-item set of the normal sample is obtained.
Respectively calculating M from the malicious sample authority set M 1 ,m 2 ,…,m x And the occurrence frequency is used as the support degree S of the element, after the frequent 1-item set is mined, pruning and connection are carried out according to the minimum support degree, and finally the frequent 3-item set of the malicious sample is obtained.
(2) Feature processing
After a frequent 3-term set of a malicious sample and a frequent 3-term set of a normal sample are mined by an Apriori algorithm, the malicious sample and the frequent 3-term set are taken as characteristics together with a sensitive API function, and characteristic attributes are screened and scored by an information gain IG algorithm. The IG algorithm obtains an IG value of the feature by calculating the difference value of the information entropy and the condition entropy of the feature, and the larger the value is, the larger the correlation degree is. Entropy calculation: according to the probability P (C) that normal software or malicious software respectively appears in the sample set i ) According to the formula:
Figure BDA0001934301640000051
and calculating the information entropy H (C) of the sample set. And (3) calculating conditional entropy: according to the formula:
Figure BDA0001934301640000052
conditional entropy H (Y | X) of the ith feature, respectively i ). Thus, according to formula IG i =H(C)-H(Y|X i ) The IG value of the ith feature is calculated asThe features of normal or malicious software are classified, so that the uncertainty reduction degree of the features is maximum, the features with IG value of 0 are removed, and the features with the rest values not being 0 are reserved as important features.
Defining a set X of feature sets reserved for application software, wherein the feature sets comprise different features (X) 1 ,x 2 ,…,x n ) Wherein n is an important characteristic number. According to the formula, ν: s → {0,1} |X| And constructing a vector space v according to the features in the set X, and enabling s to represent certain application software, wherein each dimension in v corresponds to a certain feature in X. If s contains the feature, the identification value corresponding to the feature in the vector space v is 1, otherwise, the identification value is 0, and the identification value represents whether the feature is contained.
According to the method, a matching algorithm is utilized to respectively construct a vector space v corresponding to each application software in the system, then after feature screening, a feature set comprising n features is constructed, different vector spaces v are generated corresponding to each sample, and the vector spaces v are stored in a MySQL database and serve as input of a random forest classification module.
(3) Random forest algorithm classification
After the feature vectors are obtained, detection essentially becomes a classification problem. Since the detection results are both normal and malicious, detection essentially belongs to the binary problem. While the random forest algorithm is very suitable for solving the problem of two classifications. And (4) classifying by using the obtained vector space v and adopting a random forest classification algorithm.
The following methods can be specifically adopted, and supervised classification is available: for each piece of application software in the collected known normal and malicious sample sets, according to whether the application software belongs to normal or malicious software, the application software is identified with normal or malicious attributes after each vector space corresponding to the application software, as described in the following formula.
Figure BDA0001934301640000061
Wherein V (S) represents the set of all application software, normal represents that the application software belongs to normal software, and malware represents that the application software belongs to malware.
And after the vector space of the training sample set is obtained, training the vector space to obtain the random forest classifier. After feature extraction and feature processing are carried out on software to be tested, a vector space v is obtained, wherein v does not contain a normal or malware identifier, and is blank or? ' instead of the values, a random forest classifier of the training sample is used for detecting and classifying the vector space of the software to be detected, and the normal or malware character strings are used for representing whether the software to be detected is normal software or malware in the result, so that the detection of the malware can be realized.
The invention utilizes the decompilation technology to perform batch decompilation on the application software sample set and extract the authority and the API function in the file. And in the face of high-dimensional authority characteristics, performing dimensionality reduction treatment by adopting an Apriori algorithm to obtain a frequent 3-item set of the authority, and then combining a sensitive API function to perform characteristic screening through information gain to further obtain important characteristics. And mapping the important features into a vector space, representing by 0 or 1, marking the normal application and the malicious application, and finally obtaining the vector space with the marks. And learning and classifying the sample set by adopting a random forest algorithm.

Claims (6)

1. A method for detecting android malicious application fusing a frequent item set algorithm and a random forest algorithm is characterized by comprising the following steps: performing batch decompiling on Android application software to obtain a sample set, and obtaining application software permission and API (application programming interface) function static characteristics of a sensitive application program; mining frequent item sets of the authority characteristics, and performing dimension reduction processing on the authority characteristics to obtain frequent 3-item sets of the authorities so as to obtain an association relation between the authorities in the sample set; excavating frequent 3-item sets of the malicious samples and the normal samples, respectively taking the frequent 3-item sets of the malicious samples and the normal samples and the sensitive API function as feature construction feature sets, screening and grading feature attributes in the feature sets by adopting an information gain algorithm, extracting important features, and constructing vector spaces corresponding to the important features; learning and classifying detection are carried out on the vector space by adopting a random forest classifier, and normal or malicious attribute labeling is carried out on the vector space of the normal sample and the malicious sample;
the frequent 3-item set for mining the authority features specifically comprises the following steps: respectively extracting the authority from the malicious sample or the normal sample to construct an authority set; 1-item set of mining frequent authority set: calculating the support degree S of each authority in the authority set, and pruning the frequent 1-item set which does not meet the minimum support degree min _ S to obtain a candidate set L 1 Then to L 1 The elements in (1) are connected; taking the connected candidate set as a new 2-item set, and mining a frequent 2-item set: pruning the frequent 2-item set which does not meet the minimum support degree min _ s to form a new candidate set L 2 Repeating the steps until a frequent 3-item set is obtained;
the Information Gain (IG) algorithm specifically comprises the steps of respectively generating the probability P (C) of normal software or malicious software in the sample set according to the probability i ) According to the formula:
Figure FDA0004034486200000011
calculating the information entropy H (C) of the sample set according to the formula: />
Figure FDA0004034486200000012
Computing the conditional entropy of the ith feature H (Y | X) i ) According to formula IG i =H(C)-H(Y|X i ) Calculating an IG value of the ith characteristic, wherein the larger the IG value is, the larger the correlation degree of the frequent 3-item set of the malicious sample and the normal sample is, reserving important characteristics according to the correlation degree, matching the important characteristics with each application software in the system, and respectively constructing vector spaces corresponding to the important characteristics;
specifically, the vector space construction method comprises the steps of eliminating the features with IG value of 0, reserving the features with the rest values not being 0 as important features, and constructing different feature vectors (x) containing application software samples 1 ,x 2 ,…,x n ) Feature set X, calling formula V: s → {0,1} |X| And constructing a vector space V according to the feature vectors in the set X, wherein s represents certain application software, and each dimension in V is consistent with a certain feature in XCorrespondingly, if s includes the certain feature, the identification value corresponding to the feature in the vector space V is 1, otherwise, it is 0.
2. The method of claim 1, wherein before feature extraction, a static analysis tool is used to perform decompilation on the application software to obtain a so file lib containing a resource file res, a third-party software development kit, a grammar file smali and an application description file android manifest.
3. The method as claimed in claim 2, characterized in that a programming language python script is used to extract features, all rights acquisition rights features of an application extracted from an android manifest xml file are parsed, all smali files are traversed using a method function in python-os.
4. An android malicious application detection system fusing a frequent item set algorithm and a random forest algorithm comprises: the system comprises a feature extraction module, a feature processing module and a random forest classification algorithm module, and is characterized in that the feature extraction module performs feature extraction on batch decompiled Android application software to obtain application software authority and sensitive API function static features; the characteristic processing module excavates a frequent item set of the authority characteristics, performs dimension reduction processing on the authority characteristics to obtain a frequent 3-item set of the authority so as to obtain an incidence relation between the authorities in the sample set, excavates a frequent 3-item set of a malicious sample and a frequent 3-item set of a normal sample, uses the frequent 3-item set and a sensitive API (application program interface) function as a characteristic construction characteristic set, screens and scores characteristic attributes in the characteristic set by adopting an information gain algorithm, extracts important characteristics and constructs a vector space corresponding to the important characteristics; the random forest classification algorithm module is used for learning and classifying and detecting the vector space, and a random forest classifier is used for marking the normal or malicious attributes of the vector space of the normal sample and the vector space of the malicious sample;
the frequent 3-item set for mining the authority features specifically comprises the following steps:respectively extracting the authority from the malicious sample or the normal sample to construct an authority set; 1-item set of mining frequent authority set: calculating the support degree S of each authority in the authority set, and pruning the frequent 1-item set which does not meet the minimum support degree min _ S to obtain a candidate set L 1 Then to L 1 The elements in (1) are connected; taking the connected candidate set as a new sample set, and mining a frequent 2-item set: pruning the frequent 2-item set which does not meet the minimum support degree min _ s to form a new candidate set L 2 Repeating the steps until a frequent 3-item set is obtained;
the IG algorithm with information gain specifically comprises calculating the difference between the entropy of the feature and its conditional entropy to obtain the IG value of the feature, and respectively generating probability P (C) of normal software or malicious software in the sample set i ) According to the formula:
Figure FDA0004034486200000021
calculating the information entropy H (C) of the sample set according to the formula: />
Figure FDA0004034486200000022
Calculating the conditional entropy H (Y | X) of the ith feature i ) According to the formula IG i =H(C)-H(Y|X i ) Calculating an IG value of the ith characteristic, wherein the larger the IG value is, the larger the correlation degree of the frequent 3-item set of the malicious sample and the normal sample is, reserving important characteristics according to the correlation degree, matching the important characteristics with each application software in the system, and respectively constructing vector spaces corresponding to the important characteristics;
specifically, the vector space construction method comprises the steps of eliminating the features with IG value of 0, reserving the features with the rest values not being 0 as important features, and constructing different feature vectors (x) containing application software samples 1 ,x 2 ,…,x n ) Feature set X, calling formula V: s → {0,1} |X| And constructing a vector space V according to the feature vectors in the set X, wherein s represents certain application software, each dimension in V corresponds to a certain feature in X, if s contains the certain feature, the identification value corresponding to the feature in the vector space V is 1, and if not, the identification value is 0.
5. The detection system according to claim 4, wherein the static analysis tool is used for decompiling the application software to obtain a file containing res, lib, smali and android.
6. The detection system according to claim 5, wherein a programming language python script is adopted to extract features, all rights of an application extracted from an android manifest.xml file are analyzed to obtain rights features, an os.walk () function is used to traverse all smali files, and a sensitive API function of each sample is extracted according to a regular matching method.
CN201910002795.2A 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm Active CN109753800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910002795.2A CN109753800B (en) 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910002795.2A CN109753800B (en) 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm

Publications (2)

Publication Number Publication Date
CN109753800A CN109753800A (en) 2019-05-14
CN109753800B true CN109753800B (en) 2023-04-07

Family

ID=66405239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910002795.2A Active CN109753800B (en) 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm

Country Status (1)

Country Link
CN (1) CN109753800B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210224B (en) * 2019-05-21 2023-01-31 暨南大学 Intelligent big data mobile software similarity detection method based on description entropy
CN112035836B (en) * 2019-06-04 2023-04-14 四川大学 Malicious code family API sequence mining method
CN112446026A (en) * 2019-09-03 2021-03-05 中移(苏州)软件技术有限公司 Malicious software detection method and device and storage medium
CN110851834B (en) * 2019-11-18 2024-02-27 北京工业大学 Android malicious application detection method integrating multi-feature classification
CN111324893B (en) * 2020-02-17 2022-05-10 电子科技大学 Detection method and background system for android malicious software based on sensitive mode
CN111460452B (en) * 2020-03-30 2022-09-09 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN111723371B (en) * 2020-06-22 2024-02-20 上海斗象信息科技有限公司 Method for constructing malicious file detection model and detecting malicious file
CN113949514B (en) * 2020-07-16 2024-01-26 中国电信股份有限公司 Application override detection method, device and storage medium
CN112000954B (en) * 2020-08-25 2024-01-30 华侨大学 Malicious software detection method based on feature sequence mining and simplification
CN112100621B (en) * 2020-09-11 2022-05-20 哈尔滨工程大学 Android malicious application detection method based on sensitive permission and API
CN112287345B (en) * 2020-10-29 2024-04-16 中南大学 Trusted edge computing system based on intelligent risk detection
CN112464232B (en) * 2020-11-21 2024-04-09 西北工业大学 Android system malicious software detection method based on mixed feature combination classification
CN112632539B (en) * 2020-12-28 2024-04-09 西北工业大学 Dynamic and static hybrid feature extraction method in Android system malicious software detection
CN113378167A (en) * 2021-06-30 2021-09-10 哈尔滨理工大学 Malicious software detection method based on improved naive Bayes algorithm and gated loop unit mixing
CN113378171B (en) * 2021-07-12 2022-06-21 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network
CN113592103A (en) * 2021-07-26 2021-11-02 东方红卫星移动通信有限公司 Software malicious behavior identification method based on integrated learning and dynamic analysis
CN115249048B (en) * 2022-09-16 2023-01-10 西南民族大学 Confrontation sample generation method
CN115878421B (en) * 2022-12-09 2023-11-14 国网湖北省电力有限公司信息通信公司 Data center equipment level fault prediction method, system and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138916A (en) * 2015-08-21 2015-12-09 中国人民解放军信息工程大学 Multi-track malicious program feature detecting method based on data mining
CN105740712A (en) * 2016-03-09 2016-07-06 哈尔滨工程大学 Android malicious act detection method based on Bayesian network
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN108958215A (en) * 2018-06-01 2018-12-07 天泽信息产业股份有限公司 A kind of engineering truck failure prediction system and its prediction technique based on data mining

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845220B (en) * 2015-12-07 2020-08-25 深圳先进技术研究院 Android malicious software detection system and method
CN105550583B (en) * 2015-12-22 2018-02-13 电子科技大学 Android platform malicious application detection method based on random forest classification method
CN105530265B (en) * 2016-01-28 2019-01-18 李青山 A kind of mobile Internet malicious application detection method based on frequent item set description
US10685112B2 (en) * 2016-05-05 2020-06-16 Cylance Inc. Machine learning model for malware dynamic analysis
US10558797B2 (en) * 2016-08-12 2020-02-11 Duo Security, Inc. Methods for identifying compromised credentials and controlling account access
CN107169355B (en) * 2017-04-28 2020-05-08 北京理工大学 Worm homology analysis method and device
CN108108616A (en) * 2017-12-19 2018-06-01 努比亚技术有限公司 Malicious act detection method, mobile terminal and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138916A (en) * 2015-08-21 2015-12-09 中国人民解放军信息工程大学 Multi-track malicious program feature detecting method based on data mining
CN105740712A (en) * 2016-03-09 2016-07-06 哈尔滨工程大学 Android malicious act detection method based on Bayesian network
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN108958215A (en) * 2018-06-01 2018-12-07 天泽信息产业股份有限公司 A kind of engineering truck failure prediction system and its prediction technique based on data mining

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A data mining-based approach for cardiovascular dysautonomias diagnosis and treatment;Ali Idri 等;《2017 IEEE International Conference on Computer and Information Technology》;第245-252页 *
Android平台恶意应用静态检测方法的研究;赵弋;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》(第11期);第I138-75页 *
基于改进随机森林算法的Android恶意软件检测;杨宏宇 等;《通信学报》;第38卷(第04期);第8-16页 *

Also Published As

Publication number Publication date
CN109753800A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN109753800B (en) Android malicious application detection method and system fusing frequent item set and random forest algorithm
Yakura et al. Malware analysis of imaged binary samples by convolutional neural network with attention mechanism
CN109614795B (en) Event-aware android malicious software detection method
US20200159925A1 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
Sandeep Static analysis of android malware detection using deep learning
US20210334371A1 (en) Malicious File Detection Technology Based on Random Forest Algorithm
CN113360906A (en) Interpretable graph-embedding-based Android malware automatic detection
CN114491529A (en) Android malicious application program identification method based on multi-modal neural network
KR102302484B1 (en) Method for mobile malware classification based feature selection, recording medium and device for performing the method
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
Bernardi et al. A fuzzy-based process mining approach for dynamic malware detection
CN113901465A (en) Heterogeneous network-based Android malicious software detection method
CN113468524B (en) RASP-based machine learning model security detection method
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN108171057B (en) Android platform malicious software detection method based on feature matching
Feichtner et al. Obfuscation-resilient code recognition in Android apps
CN111444502B (en) Population-oriented android malicious software detection model library method
CN114817925B (en) Android malicious software detection method and system based on multi-modal graph features
CN114817924B (en) AST (AST) and cross-layer analysis based android malicious software detection method and system
CN110990834A (en) Static detection method, system and medium for android malicious software
Banik et al. Android Malware Detection by Correlated Real Permission Couples Using FP Growth Algorithm and Neural Networks
CN115545091A (en) Integrated learner-based malicious program API (application program interface) calling sequence detection method
CN114491530A (en) Android application program classification method based on abstract flow graph and graph neural network
CN114579965A (en) Malicious code detection method and device and computer readable storage medium
CN114491528A (en) Malicious software detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant