CN109753800A - Merge the Android malicious application detection method and system of frequent item set and random forests algorithm - Google Patents

Merge the Android malicious application detection method and system of frequent item set and random forests algorithm Download PDF

Info

Publication number
CN109753800A
CN109753800A CN201910002795.2A CN201910002795A CN109753800A CN 109753800 A CN109753800 A CN 109753800A CN 201910002795 A CN201910002795 A CN 201910002795A CN 109753800 A CN109753800 A CN 109753800A
Authority
CN
China
Prior art keywords
feature
sample
frequent
permission
item collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910002795.2A
Other languages
Chinese (zh)
Other versions
CN109753800B (en
Inventor
景小荣
王丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201910002795.2A priority Critical patent/CN109753800B/en
Publication of CN109753800A publication Critical patent/CN109753800A/en
Application granted granted Critical
Publication of CN109753800B publication Critical patent/CN109753800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Android (Android) malice detection methods for merging frequent item set (Apriori) algorithm and random forests algorithm, are related to technical field of information processing.Decompiling is carried out to Android application sample, according to permission and function call static nature is extracted from each decompiling file, to obtain the incidence relation in sample set between permission;The frequent 3- item collection of malice sample and normal sample is excavated based on Apriori algorithm, and then sensitive applications programming interface (Application Programming Interface, API) function call is combined to generate feature;Study and classification to feature is realized using random forest grader, to realize that Android malicious application detects.It is detected using the malice that the present invention carries out Android application software, system resources consumption is low, and has very high Detection accuracy.

Description

Merge the Android malicious application detection method of frequent item set and random forests algorithm And system
Technical field
The present invention relates to network securitys, information security detection field, and in particular to a kind of Android malicious application detection side Method.
Background technique
Android (Android) as current intelligent terminal system most popular in the world, it is open, free with platform the features such as It is widely used in the world.Therefore, target of attack is targeted by Android and put down by many malicious code researchers Platform.With technological progress, the cost of manufacture of Android rogue program is also lower and lower, leads to the quantity of Android malware It is growing day by day.It is shown according to the data that 360 internet security centers are issued, the newly-increased malice of intercepting and capturing Android platform in 2017 is soft It is 757.3 ten thousand, part sample, average 3.1 ten thousand newly-increased daily.Malware uses the new technologies and methods such as digging mine wooden horse, Botnet It frequently launches a offensive, including steals userspersonal information, the indecent behaviors such as malice fee suction bring massive losses to user.It faces How so extensive malicious attack, effectively realize the detection to Android malicious application, becomes current Android platform peace Full matter of utmost importance.
Static detection and dynamic detection are broadly divided into the malice detection of Android application at present.Static detection refers to Without running application software, and the reverse-engineerings means such as decompiling are used, its source program is analyzed, its feature is extracted, than Such as signature, permission, directly analysis characteristic behavior.Stationary detection technique is mainly to using program-described file (AndroidManifest.xml) and grammar file (smali) code file carries out feature extraction.Guo et al. passes through parsing Information labels in AndroidManifest.xml and smali code file extract the class of application, permission, component, signature, each The processed data of kind and starting information etc..Rashidi B et al. is by permission and application programming interface (Application Programming Interface, API) function call is as characteristic set, using support vector machines (Support Vector Machine, SVM) and K- neighbour (K-Nearest Neighbor, KNN) algorithm malicious application is detected, but exist many Erroneous judgement.Machine learning can be achieved the detection of Android application software manual, improve the efficiency of analysis, but rely on and mention The application feature taken.
The dynamic detection of Android malicious application refers in application software operational process, passes through injection, hook (HOOK) etc. Technology obtains the feature of the application, but defect is that software is needed to run, and system resources consumption is excessive.In dynamic detection research side Face, Mahindru et al. uses tracker (Strace) acquisition applications software action data, and sends it to Analysis Service end, benefit With these behavior samples of classifier training, finally judge to apply whether contain malicious act using K- nearest neighbor algorithm.Singh L etc. People uses API Hook technology, carries out Hook to sensitive API in Android platform, once system or application are to specific API When calling, calling function can be intercepted and captured, proxy function is redirected it to obtain details, behavioural information can be obtained.
Summary of the invention
The technical problem to be solved by the present invention is to for the disadvantages mentioned above of the prior art, by calling machine learning to calculate Method is learnt and is detected to Android application, and Android malicious application detection complexity is reduced, and saves system resources consumption, On solving high dimensional feature and mechanized classification test problems, the Detection accuracy to Malware is further improved.
The technical solution that the present invention solves above-mentioned technical problem is to propose a kind of fusion frequent item set (Apriori) algorithm With the Android malicious application detection method of random forests algorithm, comprising the following steps: it is anti-to carry out batch to Android application software Compiling, the software permission that is applied and sensitive API function static nature;The frequent item set for excavating permission feature makees permission feature Dimension-reduction treatment obtains the frequent 3- item collection of permission, to obtain the incidence relation in sample set between permission;Excavate malice sample With the frequent 3- item collection of normal sample, it is calculated together as feature construction feature set using information gain with sensitive API function Method is screened and is scored to the characteristic attribute in feature set, is extracted important feature, is constructed corresponding vector space;Using Random forests algorithm carries out study and classification and Detection to vector space, carries out just to the vector space of normal sample and malice sample Often or the attribute of malice marks.
The present invention further comprises using static analysis tools to carry out decompiling to application software before feature extraction, obtaining To so file (lib), smali and AndroidManifest.xml comprising resource file (res), third party software development kit File, include various resource files, source code and the other static code features of the application software in file.
The present invention further comprises extracting feature, parsing using programming language (python) script The all permissions that application is extracted in the extended markup language files such as AndroidManifest.xml obtain permission feature, use Method function in python -- os.walk () traverses all smali files, extracts each sample according to canonical matching process Sensitive API function.
The present invention further comprises that the frequent 3- item collection for excavating permission feature specifically includes: respectively from malice sample and just Permission, which is extracted, in normal sample constructs authority set;The 1- item collection of Mining Frequent authority set: the support of each permission in authority set is calculated S is spent, beta pruning is carried out to the frequent 1- item collection for being unsatisfactory for minimum support min_s, obtains Candidate Set L1, then to L1In element into Row connection;Using the Candidate Set after connection as new sample set, Mining Frequent 2- item collection: to being unsatisfactory for minimum support min_s Frequent 2- item collection carry out beta pruning, form new Candidate Set L2, repeat, until obtaining frequent 3- item collection.
The present invention further comprises being specifically included using information gain (information gain, IG) algorithm, is calculated special The entropy of sign and the difference of its conditional entropy obtain the IG value of this feature, and IG value shows that more greatly degree of correlation is bigger, according to related journey Degree retains important feature, and important feature is matched with application software each in system, constructs corresponding vector respectively Space.Building vector space specifically includes, the building feature vector (x different comprising application software1,x2,…,xn) feature set X calls formula ν: s → { 0,1 }|X|, vector space ν is constructed according to the feature vector in set X, wherein s indicates some application Software, per one-dimensional corresponding with feature a certain in X in ν, if s includes a certain feature, in vector space ν with this feature pair The ident value answered is 1, is otherwise 0.
The present invention also proposes a kind of Android malicious application detection system for merging Apriori algorithm and random forests algorithm System, comprising: characteristic extracting module, feature processing block and random forest sorting algorithm module, characteristic extracting module is to by criticizing The Android application software for measuring decompiling carries out feature extraction, the software permission that is applied and sensitive API function static nature; Feature processing block excavates the frequent item set of permission feature, makees dimension-reduction treatment to permission feature, obtains the frequent 3- item collection of permission, To obtain the incidence relation in sample set between permission, excavate the frequent 3- item collection of malice sample and normal sample, by its with Sensitive API function sieves the characteristic attribute in feature set together as feature construction feature set, using information gain algorithm Choosing and scoring, extract important feature, construct corresponding vector space;Random forest sorting algorithm module to vector space into Row study and classification and Detection carry out normal or malice attribute to the vector space of normal sample and malice sample and mark.
The present invention is extracted using static detection mode using data characteristics, and then uses Apriori algorithm to data characteristics The frequent 3- item collection of permission in normal and Malware is excavated, then merges sensitive API and calls function, is created using random forest Classifier learns and classifies to it.Further, it is obtained using IG algorithm by the entropy of calculating feature and the difference of its conditional entropy The IG value of this feature is retained important feature and is constructed respectively corresponding using matching algorithm to application software each in system Vector space.The present invention carries out higher-dimension permission feature to excavate its frequent 3- item collection, less on system resources consumption.
Detailed description of the invention
Fig. 1 is the Android malicious application detection model for merging Apriori algorithm and random forests algorithm.
Specific embodiment
It elaborates below in conjunction with attached drawing to specific implementation process of the invention.
Fig. 1 show the present invention using detection system model schematic.In order to realize the inspection to Android system malicious application It surveys, the present invention merges Apriori algorithm Mining Frequent 3- item collection and random forests algorithm is classified, and proposes a kind of Android malice Using detection system, which includes characteristic extracting module, feature processing block and random forest sorting algorithm module.
Decompiling will be carried out in the sample set of the normal software being collected into and Malware in batches first, after decompiling Application program describes the power that application program is extracted in file AndroidManifest.xml and grammar file smali file It limits (Android permission) and sensitive applications programming interface api function calls, be then directed to permission feature mining The frequent 3- item collection sequence of permission is found in the syntagmatic in normal sample and malice sample between permission, and combines API quick Function is felt as learning characteristic, feature selecting is optimized to it using IG algorithm, and further, the important feature of reservation is embedded in Feature vector forms vector space, and finally it is trained and is classified using random forests algorithm, to detect Android malice Using.
It is illustrated below for each section.
(1) characteristic extracting module, using programming language Python script batch compilation sample set, after extracting decompiling The feature of AndroidManifest.xml and smali file, the feature of extraction mainly include permission feature and sensitive API function. For permission feature extraction, corresponding permission feature is extracted from some access right of application, is such as parsed The all permissions of application are extracted in AndroidManifest.xml file.Due to when user using in system a certain function or When accessing certain sensitive datas, it will apply for the power applied in access right, such as AndroidManifest.xml file Limit ----android.permission.READ_PHONE_STATE indicates that telephone state permission is read in application;For sensitivity Api function, one programming language (Java) class of each smali file representative, the various systems for containing application calling are answered With interface function, use the method in python --- os.walk () function traverses all smali files, from this document with Function (invoke) beginning is called, occurred api function is traversed according to string matching, extracts various kinds from all functions This sensitive API function.By the sensitive API function for traversing each sample that all smali files extract, so that it may correspondingly obtain Application software potentially malicious behavior.Byte code files due to smali as Android virtual machine (Dalvik), each smali One java class of file representative contains the various system application interface functions of application calling;Since Malware generates Malicious act must call corresponding api function.Therefore, using the sensitive API function of calling all in sample set as random The learning characteristic of forest algorithm, it is trained after to detect malicious application.
Before carrying out feature extraction to each sample software, it is necessary to carry out decompiling to sample set.Decompiling can be used File with .apk suffix is carried out decompiling by tool Apktool, includes resource file (res), third party sdk to obtain The files such as so file (lib), smali and AndroidManifest.xml, this class file include various resource files, source code, With other static natures.
Usual Malware can apply for some dangerous permission combinations, these groups before generating malicious act in terms of permission Credit union mutually relies on and generates malicious act.Therefore, the dangerous class permission of Malware not only request slip one, and can apply endangering Dangerous class permission combination, such as in malice sample, application permission combination is usually READ_SMS (short message reading), READ_ PHONE_STATE (reading mobile phone state), WRITE_SMS (editing short message) three, the executable privacy of user that reads are re-send to Malicious operations such as elsewhere, and rarely have this permission to combine in normal software, according to permission combine in different dangerous permissions Working in coordination, there are potentially malicious behaviors, therefore can determine whether it for Malware.
Apriori algorithm is the algorithm for the Mining Boolean Association Rules frequent item set that Agrawal et al. is proposed.Apriori The frequent 3- item collection of algorithm excavation permission.Obtain a large amount of permission feature and sensitive API function.However, the power usually obtained It is very big to limit characteristic dimension, computation complexity is high, therefore, using the frequent item set for excavating permission feature based on Apriori algorithm Dimension-reduction treatment is carried out to permission characteristic dimension, to obtain the frequent 3- item collection of permission.The frequent 3- item collection of permission feature is excavated, with The incidence relation in sample set between permission is obtained, its specific step is described as follows.
The frequent 3- item collection of permission is excavated based on Apriori algorithm, concretely, this is extracted from all samples using Shen Normal software sample authority set P and malice sample authority set M please, wherein P={ p1,p2,…,pnRepresent normal software sample Authority set, indicate whole applied n permissions of normal software sample, M={ m1,m2,…,mxRepresent the power of malice sample Limit collection indicates applied x permission in whole malice samples.It is excavated respectively for the authority set of normal sample and malice sample Frequent 3- item collection.Following method specifically can be used:
To the authority set Mining Frequent 1- item collection of sample permission: calculating the support S of each permission in sample authority set, table Show the probability that the permission occurs in all sample sets, beta pruning carried out to the frequent 1- item collection for being unsatisfactory for minimum support min_s, To obtain the set for meeting condition, and as Candidate Set L1, then to L1In element be attached;It then will be after connection Candidate Set includes all 2- item collections, then the Mining Frequent 2- item collection from new sample set, to discontented as new sample set at this time The frequent 2- item collection of sufficient minimum support min_s carries out beta pruning, forms new Candidate Set L2, according to above-mentioned steps, repeat, Until obtaining the frequent 3- item collection of sample authority set.
Connection: in a certain frequent n- item collection set, before being found downwards since the first item (for example i-th) of the set The nth elements of all elements in i and j are then connected into the (n+1)th item collection by n-1 same items (such as jth item).
From normal software sample authority set P, p is calculated separately1,p2,…,pnSupport of the frequency of appearance as the element Spend S, minimum support be P in the minimum appearance of each element frequency and between 0 to 1, after Mining Frequent 1- item collection, according to Minimum support carries out beta pruning and connection, finally obtains the frequent 3- item collection of normal sample.
From malice sample authority set M, m is calculated separately1,m2,…,mxSupport S of the frequency of appearance as the element, After Mining Frequent 1- item collection, beta pruning and connection are carried out according to minimum support, finally obtain frequent 3 item collection of malice sample.
(2) characteristic processing
After the frequent 3- item collection for excavating malice sample and normal sample using Apriori algorithm, by itself and sensitive API letter Number is screened and is scored to characteristic attribute using information gain IG algorithm together as feature.IG algorithm is by calculating feature Comentropy and the difference of its conditional entropy obtain the IG value of this feature, which shows that more greatly degree of correlation is bigger.Entropy calculates: root Probability P (the C occurred respectively according to normal software in sample set or Malwarei), according to formula:The comentropy H (C) of sample set is calculated.The calculating of conditional entropy: according to formula:Respectively ith feature conditional entropy H (Y | Xi).Therefore, according to formula IGi=H (C)-H(Y|Xi) the IG value that calculates ith feature is, in order to screen that advantageous classification is normal in multiple features of comforming or Malware Feature so that the uncertain reduction degree of feature is maximum, therefore the feature that IG value is 0 is rejected, and is not 0 by its residual value Feature be retained as important feature.
Definition set X is the feature set that application software retains, and includes different feature (x in feature set1,x2,…,xn), In, n is important characteristic.According to formula, ν: s → { 0,1 }|X|, according to the feature construction vector space ν in set X, s is enabled to indicate Some application software, wherein per one-dimensional corresponding with feature a certain in X in ν.If s includes this feature, in vector space ν with The corresponding ident value of this feature is 1, is otherwise 0, and whether ident value representative contains this feature.
It is empty that corresponding vector is constructed respectively to application software each in system using matching algorithm according to the method described above Between ν, then, after Feature Selection, building one include n feature feature set, each sample of correspondence generate it is different to Quantity space ν, and it is deposited into MySQL database, the input as random forest categorization module.
(3) random forests algorithm is classified
After obtaining feature vector, detection substantially becomes a kind of classification problem.Since the result of detection is normal and malice Two classes, so detection substantially just belongs to two classification problems.And random forests algorithm is very suitable to solve two classification problems.It utilizes The vector space ν of acquisition is realized using random forest sorting algorithm and is classified.
Following methods specifically can be used, Supervised classification: for known to being collected into normal and malice sample set it is each Application software belongs to normal or Malware according to each application software, in each vector space corresponding with each application software Behind, normal or malice attribute mark is carried out to each application software, as described in following formula.
Wherein V (S) indicates all application software set, and normal indicates that the application software belongs to normal software, malware Indicate that the application software belongs to Malware.
After obtaining the vector space of training sample set, it is trained to obtain random forest grader.It will be to be measured soft Part obtains vector space ν after feature extraction and characteristic processing, and ν at this time is free of normal or malware identifier, with sky It is white or '? ' its value is replaced, then examined using vector space of the random forest grader of training sample to the software under testing Classification is surveyed, is in the result normal software or Malware with normal the or malware string representation software under testing, by This can realize the detection to Malware.
The present invention utilize inverse compiling technique, to application software sample collection carry out batch decompiling, in file permission and Api function extracts.In face of higher-dimension permission feature, dimension-reduction treatment is carried out using Apriori algorithm, obtains the frequent 3- of permission Item collection carries out Feature Selection by information gain, further obtains important feature in conjunction with sensitive API function.By important feature Be mapped to vector space, indicated with 0 or 1, and normal use and malicious application are marked, finally obtain with it is markd to Quantity space.Sample set is learnt and classified using random forests algorithm.

Claims (12)

1. a kind of Android malicious application detection method for merging frequent item set algorithm and random forests algorithm, which is characterized in that packet It includes following steps: batch decompiling being carried out to Android Android application software and obtains sample set, the software permission that is applied and quick Feel application programming interface api function static nature;The frequent item set for excavating permission feature makees dimension-reduction treatment to permission feature, The frequent 3- item collection of permission is obtained, to obtain the incidence relation in sample set between permission;Excavate malice sample and normal sample This frequent 3- item collection, respectively by the frequent 3- item collection of malice sample and normal sample its with sensitive API function together as spy Construction feature collection is levied, the characteristic attribute in feature set is screened and scored using information gain algorithm, extracts important feature, Construct corresponding vector space;Study and classification and Detection are carried out to vector space using random forest grader, to normal The vector space of sample and malice sample carries out normal or malice attribute label.
2. method according to claim 1, which is characterized in that use static analysis tools to application software before feature extraction Carry out decompiling, obtain comprising resource file res, third party software development kit so file lib, grammar file smali and answer With include in program-described file AndroidManifest.xml the various resource files of the application software, source code and its Its static code feature.
3. method according to claim 1, which is characterized in that extract feature, parsing using programming language python script The all permissions that application is extracted in AndroidManifest.xml file obtain permission feature, use the method letter in python Number --- os.walk () traverses all smali files, and the sensitivity of all samples in sample set is extracted according to canonical matching process Api function.
4. method according to claim 1, which is characterized in that the frequent 3- item collection for excavating permission feature specifically includes: respectively Permission is extracted from malice sample or normal sample constructs authority set;The 1- item collection of Mining Frequent authority set: it calculates in authority set The support S of each permission carries out beta pruning to the frequent 1- item collection for being unsatisfactory for minimum support min_s, obtains Candidate Set L1, then To L1In element be attached;Using the Candidate Set after connection as new 2- item collection, Mining Frequent 2- item collection: to being unsatisfactory for most The frequent 2- item collection of small support min_s carries out beta pruning, forms new Candidate Set L2, repeat, it is 3- frequent until obtaining Collection.
5. method according to claim 1, which is characterized in that had using information gain (InformationGain, IG) algorithm Body includes the probability P (C occurred respectively according to normal software in sample set or Malwarei), according to formula:The comentropy H (C) for calculating sample set, according to formula:Calculate ith feature conditional entropy H (Y | Xi), according to formula IGi=H (C)-H (Y | Xi) the IG value of ith feature is calculated, IG value shows more greatly frequent 3- intensities of related journey malice sample and normal sample more Greatly, according to degree of correlation retain important feature, important feature is matched with application software each in system, respectively building and Corresponding vector space.
6. method according to claim 5, which is characterized in that building vector space specifically includes, and the feature that IG value is 0 is picked It removes, and the feature that its residual value is not 0 is retained as important feature, the building feature vector different comprising application software sample (x1,x2,…,xn) feature set X, call formula ν: s → { 0,1 }|X|, vector space is constructed according to the feature vector in set X ν, wherein s indicates some application software, per one-dimensional corresponding with feature a certain in X in ν, if s includes a certain feature, Ident value corresponding with this feature is 1 in vector space ν, is otherwise 0.
7. a kind of Android malicious application detection system for merging frequent item set algorithm and random forests algorithm, comprising: feature extraction Module, feature processing block and random forest sorting algorithm module, which is characterized in that characteristic extracting module is compiled to by batch is anti- The Android application software translated carries out feature extraction, the software permission that is applied and sensitive API function static nature;At feature The frequent item set that module excavates permission feature is managed, dimension-reduction treatment is made to permission feature, obtains the frequent 3- item collection of permission, to obtain Incidence relation in sample set between permission excavates the frequent 3- item collection of malice sample and normal sample, by itself and sensitive API Function is screened and is commented to the characteristic attribute in feature set together as feature construction feature set, using information gain algorithm Point, important feature is extracted, corresponding vector space is constructed;Random forest sorting algorithm module learns vector space And classification and Detection, normal or malice category is carried out using vector space of the random forest grader to normal sample and malice sample Property label.
8. detection system according to claim 7, which is characterized in that carried out using static analysis tools to application software anti- Compiling obtains the file comprising res, lib, smali and AndroidManifest.xml, includes the application software in file Various resource files, source code and other static code features.
9. detection system according to claim 7, which is characterized in that feature is extracted using programming language python script, The all permissions that application is extracted in parsing AndroidManifest.xml file obtain permission feature, use os.walk () letter All smali files are gone through several times, and the sensitive API function of each sample is extracted according to canonical matching process.
10. detection system according to claim 7, which is characterized in that the frequent 3- item collection for excavating permission feature is specifically wrapped It includes: extracting permission building authority set from malice sample or normal sample respectively;The 1- item collection of Mining Frequent authority set: power is calculated Limit concentrates the support S of each permission, carries out beta pruning to the frequent 1- item collection for being unsatisfactory for minimum support min_s, obtains candidate Collect L1, then to L1In element be attached;Using the Candidate Set after connection as new sample set, Mining Frequent 2- item collection: to not The frequent 2- item collection for meeting minimum support min_s carries out beta pruning, forms new Candidate Set L2, repeat, until obtaining frequency Numerous 3- item collection.
11. detection system according to claim 7, which is characterized in that specifically included using IG algorithm, calculate the entropy of feature The difference of value and its conditional entropy obtains the IG value of this feature, is occurred respectively according to normal software in sample set or Malware general Rate P (Ci), according to formula:The comentropy H (C) for calculating sample set, according to formula:Calculate ith feature conditional entropy H (Y | Xi), according to formula IGi=H (C)-H (Y | Xi) the IG value of ith feature is calculated, IG value shows more greatly frequent 3- intensities of related journey malice sample and normal sample more Greatly, according to degree of correlation retain important feature, important feature is matched with application software each in system, respectively building and Corresponding vector space.
12. detection system according to claim 11, which is characterized in that IG value shows that more greatly degree of correlation is bigger, according to Degree of correlation retains important feature, and important feature is matched with application software each in system, and building is corresponding to it respectively Vector space, building vector space have including, by IG value be 0 feature reject, and by its residual value be not 0 feature retain As important feature, the building feature vector (x different comprising application software sample1,x2,…,xn) feature set X, call formula ν: s → { 0,1 }|X|, vector space ν is constructed according to the feature vector in set X, wherein s indicates some application software, every in ν It is one-dimensional corresponding with feature a certain in X, if s includes a certain feature, ident value corresponding with this feature in vector space ν It is 1, is otherwise 0.
CN201910002795.2A 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm Active CN109753800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910002795.2A CN109753800B (en) 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910002795.2A CN109753800B (en) 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm

Publications (2)

Publication Number Publication Date
CN109753800A true CN109753800A (en) 2019-05-14
CN109753800B CN109753800B (en) 2023-04-07

Family

ID=66405239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910002795.2A Active CN109753800B (en) 2019-01-02 2019-01-02 Android malicious application detection method and system fusing frequent item set and random forest algorithm

Country Status (1)

Country Link
CN (1) CN109753800B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851834A (en) * 2019-11-18 2020-02-28 北京工业大学 Android malicious application detection method integrating multi-feature classification
CN111324893A (en) * 2020-02-17 2020-06-23 电子科技大学 Detection method and background system for android malicious software based on sensitive mode
CN111460452A (en) * 2020-03-30 2020-07-28 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN111723371A (en) * 2020-06-22 2020-09-29 上海斗象信息科技有限公司 Method for constructing detection model of malicious file and method for detecting malicious file
WO2020233322A1 (en) * 2019-05-21 2020-11-26 暨南大学 Description-entropy-based intelligent detection method for big data mobile software similarity
CN112000954A (en) * 2020-08-25 2020-11-27 莫毓昌 Malicious software detection method based on feature sequence mining and simplification
CN112035836A (en) * 2019-06-04 2020-12-04 四川大学 Malicious code family API sequence mining method
CN112100621A (en) * 2020-09-11 2020-12-18 哈尔滨工程大学 Android malicious application detection method based on sensitive permission and API
CN112287345A (en) * 2020-10-29 2021-01-29 中南大学 Credible edge computing system based on intelligent risk detection
CN112446026A (en) * 2019-09-03 2021-03-05 中移(苏州)软件技术有限公司 Malicious software detection method and device and storage medium
CN112464232A (en) * 2020-11-21 2021-03-09 西北工业大学 Android system malicious software detection method based on mixed feature combination classification
CN112632539A (en) * 2020-12-28 2021-04-09 西北工业大学 Dynamic and static mixed feature extraction method in Android system malicious software detection
CN112651024A (en) * 2020-12-29 2021-04-13 重庆大学 Method, device and equipment for malicious code detection
CN113378171A (en) * 2021-07-12 2021-09-10 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network
CN113378167A (en) * 2021-06-30 2021-09-10 哈尔滨理工大学 Malicious software detection method based on improved naive Bayes algorithm and gated loop unit mixing
CN113592103A (en) * 2021-07-26 2021-11-02 东方红卫星移动通信有限公司 Software malicious behavior identification method based on integrated learning and dynamic analysis
CN113949514A (en) * 2020-07-16 2022-01-18 中国电信股份有限公司 Application override detection method, device and storage medium
CN115249048A (en) * 2022-09-16 2022-10-28 西南民族大学 Confrontation sample generation method
CN115878421A (en) * 2022-12-09 2023-03-31 国网湖北省电力有限公司信息通信公司 Data center equipment-level fault prediction method, system and medium based on log time sequence correlation characteristic mining
CN117708813A (en) * 2023-11-30 2024-03-15 四川大学 Security detection method and system for software development environment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138916A (en) * 2015-08-21 2015-12-09 中国人民解放军信息工程大学 Multi-track malicious program feature detecting method based on data mining
CN105530265A (en) * 2016-01-28 2016-04-27 李青山 Mobile Internet malicious application detection method based on frequent itemset description
CN105550583A (en) * 2015-12-22 2016-05-04 电子科技大学 Random forest classification method based detection method for malicious application in Android platform
CN105740712A (en) * 2016-03-09 2016-07-06 哈尔滨工程大学 Android malicious act detection method based on Bayesian network
CN106845220A (en) * 2015-12-07 2017-06-13 深圳先进技术研究院 A kind of Android malware detecting system and method
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN107169355A (en) * 2017-04-28 2017-09-15 北京理工大学 A kind of worm homology analysis method and apparatus
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
US20180046796A1 (en) * 2016-08-12 2018-02-15 Duo Security, Inc. Methods for identifying compromised credentials and controlling account access
CN108108616A (en) * 2017-12-19 2018-06-01 努比亚技术有限公司 Malicious act detection method, mobile terminal and storage medium
US20180322287A1 (en) * 2016-05-05 2018-11-08 Cylance Inc. Machine learning model for malware dynamic analysis
CN108958215A (en) * 2018-06-01 2018-12-07 天泽信息产业股份有限公司 A kind of engineering truck failure prediction system and its prediction technique based on data mining

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138916A (en) * 2015-08-21 2015-12-09 中国人民解放军信息工程大学 Multi-track malicious program feature detecting method based on data mining
CN106845220A (en) * 2015-12-07 2017-06-13 深圳先进技术研究院 A kind of Android malware detecting system and method
CN105550583A (en) * 2015-12-22 2016-05-04 电子科技大学 Random forest classification method based detection method for malicious application in Android platform
CN105530265A (en) * 2016-01-28 2016-04-27 李青山 Mobile Internet malicious application detection method based on frequent itemset description
CN105740712A (en) * 2016-03-09 2016-07-06 哈尔滨工程大学 Android malicious act detection method based on Bayesian network
US20180322287A1 (en) * 2016-05-05 2018-11-08 Cylance Inc. Machine learning model for malware dynamic analysis
US20180046796A1 (en) * 2016-08-12 2018-02-15 Duo Security, Inc. Methods for identifying compromised credentials and controlling account access
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN107169355A (en) * 2017-04-28 2017-09-15 北京理工大学 A kind of worm homology analysis method and apparatus
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN108108616A (en) * 2017-12-19 2018-06-01 努比亚技术有限公司 Malicious act detection method, mobile terminal and storage medium
CN108958215A (en) * 2018-06-01 2018-12-07 天泽信息产业股份有限公司 A kind of engineering truck failure prediction system and its prediction technique based on data mining

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALI IDRI 等: "A data mining-based approach for cardiovascular dysautonomias diagnosis and treatment", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY》 *
杨宏宇 等: "基于改进随机森林算法的Android恶意软件检测", 《通信学报》 *
赵弋: "Android平台恶意应用静态检测方法的研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020233322A1 (en) * 2019-05-21 2020-11-26 暨南大学 Description-entropy-based intelligent detection method for big data mobile software similarity
CN112035836A (en) * 2019-06-04 2020-12-04 四川大学 Malicious code family API sequence mining method
CN112446026A (en) * 2019-09-03 2021-03-05 中移(苏州)软件技术有限公司 Malicious software detection method and device and storage medium
CN110851834B (en) * 2019-11-18 2024-02-27 北京工业大学 Android malicious application detection method integrating multi-feature classification
CN110851834A (en) * 2019-11-18 2020-02-28 北京工业大学 Android malicious application detection method integrating multi-feature classification
CN111324893B (en) * 2020-02-17 2022-05-10 电子科技大学 Detection method and background system for android malicious software based on sensitive mode
CN111324893A (en) * 2020-02-17 2020-06-23 电子科技大学 Detection method and background system for android malicious software based on sensitive mode
CN111460452A (en) * 2020-03-30 2020-07-28 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN111460452B (en) * 2020-03-30 2022-09-09 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN111723371A (en) * 2020-06-22 2020-09-29 上海斗象信息科技有限公司 Method for constructing detection model of malicious file and method for detecting malicious file
CN111723371B (en) * 2020-06-22 2024-02-20 上海斗象信息科技有限公司 Method for constructing malicious file detection model and detecting malicious file
CN113949514B (en) * 2020-07-16 2024-01-26 中国电信股份有限公司 Application override detection method, device and storage medium
CN113949514A (en) * 2020-07-16 2022-01-18 中国电信股份有限公司 Application override detection method, device and storage medium
CN112000954A (en) * 2020-08-25 2020-11-27 莫毓昌 Malicious software detection method based on feature sequence mining and simplification
CN112000954B (en) * 2020-08-25 2024-01-30 华侨大学 Malicious software detection method based on feature sequence mining and simplification
CN112100621A (en) * 2020-09-11 2020-12-18 哈尔滨工程大学 Android malicious application detection method based on sensitive permission and API
CN112100621B (en) * 2020-09-11 2022-05-20 哈尔滨工程大学 Android malicious application detection method based on sensitive permission and API
CN112287345A (en) * 2020-10-29 2021-01-29 中南大学 Credible edge computing system based on intelligent risk detection
CN112287345B (en) * 2020-10-29 2024-04-16 中南大学 Trusted edge computing system based on intelligent risk detection
CN112464232B (en) * 2020-11-21 2024-04-09 西北工业大学 Android system malicious software detection method based on mixed feature combination classification
CN112464232A (en) * 2020-11-21 2021-03-09 西北工业大学 Android system malicious software detection method based on mixed feature combination classification
CN112632539A (en) * 2020-12-28 2021-04-09 西北工业大学 Dynamic and static mixed feature extraction method in Android system malicious software detection
CN112632539B (en) * 2020-12-28 2024-04-09 西北工业大学 Dynamic and static hybrid feature extraction method in Android system malicious software detection
CN112651024A (en) * 2020-12-29 2021-04-13 重庆大学 Method, device and equipment for malicious code detection
CN113378167A (en) * 2021-06-30 2021-09-10 哈尔滨理工大学 Malicious software detection method based on improved naive Bayes algorithm and gated loop unit mixing
CN113378171B (en) * 2021-07-12 2022-06-21 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network
CN113378171A (en) * 2021-07-12 2021-09-10 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network
CN113592103A (en) * 2021-07-26 2021-11-02 东方红卫星移动通信有限公司 Software malicious behavior identification method based on integrated learning and dynamic analysis
CN115249048A (en) * 2022-09-16 2022-10-28 西南民族大学 Confrontation sample generation method
CN115878421A (en) * 2022-12-09 2023-03-31 国网湖北省电力有限公司信息通信公司 Data center equipment-level fault prediction method, system and medium based on log time sequence correlation characteristic mining
CN115878421B (en) * 2022-12-09 2023-11-14 国网湖北省电力有限公司信息通信公司 Data center equipment level fault prediction method, system and medium
CN117708813A (en) * 2023-11-30 2024-03-15 四川大学 Security detection method and system for software development environment
CN117708813B (en) * 2023-11-30 2024-06-21 四川大学 Security detection method and system for software development environment

Also Published As

Publication number Publication date
CN109753800B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109753800A (en) Merge the Android malicious application detection method and system of frequent item set and random forests algorithm
CN106572117B (en) A kind of detection method and device of WebShell file
CN105184160B (en) A kind of method of the Android phone platform application program malicious act detection based on API object reference relational graphs
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN103106365B (en) The detection method of the malicious application software on a kind of mobile terminal
CN109684840A (en) Based on the sensitive Android malware detection method for calling path
CN105229661B (en) Method, computing device and the storage medium for determining Malware are marked based on signal
Zhu et al. Android malware detection based on multi-head squeeze-and-excitation residual network
CN105138916B (en) Multi-trace rogue program characteristic detection method based on data mining
CN102567661A (en) Program recognition method and device based on machine learning
CN108734012A (en) Malware recognition methods, device and electronic equipment
CN113139192B (en) Third party library security risk analysis method and system based on knowledge graph
CN113076538B (en) Method for extracting embedded privacy policy of mobile application APK file
KR102120200B1 (en) Malware Crawling Method and System
US20210334371A1 (en) Malicious File Detection Technology Based on Random Forest Algorithm
Martín et al. A new tool for static and dynamic Android malware analysis
CN113468524B (en) RASP-based machine learning model security detection method
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
Sanz et al. Instance-based anomaly method for Android malware detection
CN114817924B (en) AST (AST) and cross-layer analysis based android malicious software detection method and system
CN106503552A (en) The Android malware detecting system that is excavated with pattern of traffic based on signature and method
CN115292674A (en) Fraud application detection method and system based on user comment data
CN112257076A (en) Vulnerability detection method based on random detection algorithm and information aggregation
CN114579965A (en) Malicious code detection method and device and computer readable storage medium
CN107018152A (en) Message block method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant