CN109271788A - A kind of Android malware detection method based on deep learning - Google Patents

A kind of Android malware detection method based on deep learning Download PDF

Info

Publication number
CN109271788A
CN109271788A CN201810963774.2A CN201810963774A CN109271788A CN 109271788 A CN109271788 A CN 109271788A CN 201810963774 A CN201810963774 A CN 201810963774A CN 109271788 A CN109271788 A CN 109271788A
Authority
CN
China
Prior art keywords
feature
file
android
malware
apk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810963774.2A
Other languages
Chinese (zh)
Other versions
CN109271788B (en
Inventor
罗森林
张寒青
潘丽敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810963774.2A priority Critical patent/CN109271788B/en
Publication of CN109271788A publication Critical patent/CN109271788A/en
Application granted granted Critical
Publication of CN109271788B publication Critical patent/CN109271788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The Android malware detection method based on deep learning that the present invention relates to a kind of, belongs to computer and information science technical field.The present invention carries out feature extraction to Android application software first, then by unziping it to Android application file and relevant security feature is extracted in the operations such as decompiling.The feature of extraction includes 3 aspects: the N-gram statistical nature that file structure feature, safety experience feature and Dalvik instruction set are constituted.Then numeralization processing, construction feature vector are carried out to the feature of extraction.Finally the correlated characteristic based on said extracted constructs DNN (Deep Neural Network) model.New Android software is classified and identified by the model of building.This method has merged the analysis of instruction set, have the function of that fighting Malware obscures, simultaneously the malware detection based on depth model can Enhanced feature study, the abundant internal information of big data can be expressed well, be more easier the Malware for adapting to constantly evolve.

Description

A kind of Android malware detection method based on deep learning
Technical field
The Android malware detection method based on deep learning that the present invention relates to a kind of, belongs to computer and information Science and technology field.
Background technique
With the continuous development of mobile Internet, intelligent terminal becomes important component in everyone life. Android causes Malware general as most popular Mobile operating system, since it is opened with flexible ecological environment Excessively.It is the research topic with important value that Android malware, which how is effectively detected,.Current main-stream Android malicious code detecting method is roughly divided into static detection method and dynamic testing method.
1. dynamic testing method
So-called dynamic detection and analysis, refer to that extraction feature is detected and analyzed again after allowing detected program to behave Method.Dynamic testing method is mainly by operating in Android application file in Android device, then by adopting API Calls sequence, resource in collection software running process such as use to realize the analysis to software at the data.Although dynamic analysis have Not by code shell adding, the advantages of limiting factors are influenced such as obscure, but this method has data in actual use that acquisition is extracted Difficulty, software run cost is big, code coverage is low, is easy reverse-examination survey the etc. and detection to running environment by Malware Problem.Therefore, detected by the method for dynamic analysis Malware use in practice it is less.
2. static detection method
The method of static detection mainly by the way that Android application file is scanned and is analyzed, extracts Android In file and safety-related sensitive information and feature, such as sensitive permission, system acting, sensory system are called etc..Then needle These features of refinement are analyzed and concluded and judge whether it is Malware.It is compared with dynamic analysing method, it is static The method of analysis code coverage with higher and lesser time overhead, can be normally reached preferable Detection accuracy. This method is also the detection method of current various checking and killing virus software mainstreams.But in the actual environment, Android application is opened Originator the operation such as obscures and encrypts to be protected often to will do it to code, and static analysis is just not easy to mention under this environment Validity feature is got to judge by accident to it.Meanwhile Malware is every year all in evolution and development at full speed, conventional detection Method is difficult to adapt to the new Malware continued to bring out.
For above-mentioned problem, this project proposes a kind of Malware classification method based on deep learning.On the one hand, By analyzing Android application file, the common static nature of some Malwares has been extracted.On the other hand, lead to It crosses and Smalli source code is extracted to the progress decompiling of Android application file, then extract Dalvik from Smalli source code Then operation code is abstracted its instruction set and extracts N-gram sequence signature.Finally by the feature normalizing of said extracted The identification that abstract modeling completes Malware is carried out by deep learning algorithm after change processing.The detection system of set analysis based on instruction System has the function of that fighting Malware obscures.Malware detection based on depth model can Enhanced feature study, to big The abundant internal information of data is able to carry out good expression, is more easier the Malware for adapting to constantly evolve.
Summary of the invention
Present invention aim to address conventional Android malware detection method Detection accuracies, and low, detection is applicable in model The problem of being with limit and being difficult to adapt to emerging software, proposes a kind of malware detection method based on deep learning.
Design principle of the invention are as follows: feature extraction is carried out to Android application software first.Then by pair Android application file, which is unziped it, extracts relevant security feature with operations such as decompilings.The feature of extraction includes 3 sides Face: the N-gram statistical nature that file structure feature, safety experience feature and Dalvik instruction set are constituted.Then to the spy of extraction Sign carries out numeralization processing construction feature vector.Finally the correlated characteristic based on said extracted constructs DNN (Deep Neural Network) model.Software classification and identification are carried out to new Android by the model of building.
The technical scheme is that be achieved by the steps of:
Step 1, the positive and negative sample file of Android is obtained, then file is pre-processed.
Step 1.1, malice Android software library is obtained from http://amd.arguslab.org/behaviors to be total to 24552, normal software is obtained in the market from Android for 21000.
Step 1.2, operation is unziped it to each application software, extracts Android application software The files such as AndroidManifest.xml file, res file, classs.dex file are for subsequent analysis.
Step 1.3, decompiling operation is carried out to class.dex file by Andguard tool, then extracted Dalvik operation code.
Step 2, feature extraction is carried out to Android application file.
Step 2.1, the file obtained with regard to step 1 carries out feature extraction, and the feature of extraction includes file structure feature, safety N-gram feature after empirical features and Dalvik instruction set are abstract.
Step 2.2, then quantize to the feature extracted, normalization obtained after indicating the feature of each application to Amount.
Step 3, it constructs neural network classifier and recognition detection is carried out to software.
Step 3.1, according to the feature vector partition testing collection and data set of database and training neural network.
Step 3.2, pretreatment and feature extraction are carried out to new software, is then based on the neural network classifier of building Classify, provides detection of the software classification result completion to software.
Beneficial effect
Compared to traditional static analysis method, the present invention is extracted software features more abundant, including file structure Feature, empirical features and the N-gram statistical nature based on Dalvik instruction set.These features can be to Android software characteristic It is characterized more fully hereinafter, to reach higher Detection accuracy.
Compared to traditional machine learning classification algorithm, deep learning can Enhanced feature study, big data is enriched Internal information is expressed well, is more easier the Malware for adapting to constantly evolve.
Detailed description of the invention
Fig. 1 is the Android malware detection method schematic diagram of the invention based on deep learning.
Specific embodiment
Objects and advantages in order to better illustrate the present invention are done below with reference to embodiment of the example to the method for the present invention It is further described.
Detailed process are as follows:
Step 1, the positive and negative sample file of Android is obtained, then file is pre-processed
Step 1.1, malice Android software library is obtained from http://amd.arguslab.org/behaviors to be total to 24552, normal software is obtained in the market from Android for 21000.
Step 1.2, Android application software is extracted by Andguard tool to each application software The files such as AndroidManifest.xml file, res file, classs.dex file are for subsequent analysis.
Step 1.3, decompiling operation is carried out to class.dex file by Andguard tool, then extracted each The Dalvik operation code of Smalli file.
Step 2, feature extraction is carried out to Android application file.
Step 2.1, the file obtained with regard to step 1 carries out feature extraction.The first kind is structured features, including APK application Sensitive permission, using comprising system acting, using comprising activity, service, Broadcast Receive, Content Provider quantity etc. 63 is tieed up.Second is empirical features, it mainly includes the warp that long-term malice APK is tested and analyzed Test the feature of summary, including whether include executable file in resource file, in assets file whether comprising APK file, The number etc. of function of the image file number and parameter for including in resource file in APK file greater than 20.General installation file In keep malicious code in executable file under the APK maximum probability comprising additional executable file.Malware has less Image file number and in order to hide detection have malice be inclined to power function possess more parameter.Empirical features are total 4 dimensions.Third is the N-gram feature after Dalvik instruction set is abstract.Malware is recognized according to the analysis to Malware Realize that the code of malicious intent can all concentrate in a malicious file, when counting N-gram feature as unit of single file, Then the N-gram feature of statistics is weighted, as final feature vector.Dalvik is instructed according to function spy first Property be divided into 10 classes, specific situation of classifying is as shown in table 1.Then each Smalli file statistics Dalvik in APK file is referred to Enable the symbol sebolic addressing after being abstracted.Then N-gram processing is done to the sequence, that choose here is 3-gram.A such as APK text Part has n Smalli file, each file can count 1000 dimension statistical natures and be denoted as Fn, concrete form such as formula (1) It is shown, wherein fnkIndicate n-th of file, k-th of characteristic statistics quantitative value.
Fn=[fn0,fn1,fn2……fn999] (1)
Then then 1000 new dimensional feature F are can be obtained into as most after normalization indicates in n 1000 dimensional features weightings Whole Dalvik bytecode N-gram statistical nature, shown in specific form such as formula (2).
F=[k0, k1……km……k999] (2)
In formula, kmIt indicates in new statistical nature, m-th of characteristic value.
The definition of 1 command character meaning of table
Step 2.2, it quantizes to the feature that step 2.1 is extracted, normalization obtains the spy of each application after indicating Levy vector.Assuming that database D, then the database can be indicated with following matrix.
Wherein, database D shares n sample, and the attribute of each sample shares p dimension, and each sample has target value Y.This Place, target value value 0 or 1,1 are expressed as positive sample, and 0 is expressed as negative sample.The characteristic dimension of each sample is in the method 1067, n 45120.
Step 3, it constructs neural network classifier and recognition detection is carried out to software.
Step 3.1, according to the feature vector partition testing collection and training set of database and training neural network.
Step 3.2, pretreatment and feature extraction are carried out to new software, is then based on the neural network classifier of building Classify, provides detection of the software classification result completion to software.
Test result: amount in experiment and choose positive negative sample 45120 (partial document damage can not extract), wherein malice 23511, sample, normal software 21609.The neural network model for being then based on building carries out 5 foldings on entire data set Cross validation.Average Accuracy, average recall rate and the average F value for finally measuring positive negative sample are as shown in table 2.From table Experimental data can be seen that this method Detection accuracy with higher can reach preferable detection effect.
2 test experiments result of table
Above-described specific descriptions have carried out further specifically the purpose of invention, technical scheme and beneficial effects It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention Protection scope within.

Claims (4)

1. a kind of Android malware detection method based on deep learning, it is characterised in that the method includes walking as follows It is rapid:
Step 1, the positive and negative sample file of Android is obtained, then file is pre-processed, comprising: choose positive and negative Android and answer With software, and processing is unziped it to APK file and obtains file all in file in APK;Then to class.dex file Decompiling processing is carried out, Dalvik operation code in every portion smalli file is extracted in each APK;
Step 2, the file obtained with regard to step 1 carries out feature extraction and obtains each software features vector, comprising: to step 1 The N-gram statistics that obtained file is handled to obtain the empirical features of APK file, structure feature and Dalvik instruction set is special Sign, and the feature vector of each software is obtained after features described above is carried out numeralization and normalized;
Step 3, disaggregated model is constructed according to the data that step 2 is extracted, is tested on data set using 5 foldings intersection in building process Card method assesses model, finally carries out recognition detection based on building neural network classifier and to software.
2. a kind of Android malware detection method based on deep learning according to claim 1, feature exist In: when extracting the feature of Android application in step 2, the feature of extraction includes structure feature, empirical features and Dalvik instruction The abstract N-gram feature of collection;Structured features, the sensitive permission including APK application, using comprising system acting, using packet Activity, service, Broadcast Receive, Content Provider quantity for containing etc. 63 is tieed up;Empirical features, it Mainly include the feature for the summary of experience that long-term malice APK is tested and analyzed, including whether include executable file in resource file, Whether it is greater than comprising the image file number for including in resource file in APK file, APK file and parameter in assets file Number of 20 function etc. 4 is tieed up;3-gram statistical nature 1000 after Dalvik instruction set is abstract is tieed up.
3. a kind of Android malware detection method based on deep learning according to claim 1, feature exist In: in view of Malware realizes that the function code of malicious intent can all concentrate on one when counting N-gram feature in step 2 In malicious file, therefore as unit of when counting N-gram feature by single Smalli file, then each file is counted N-gram feature be weighted with after normalized as final feature vector.
4. a kind of Android malware detection method based on deep learning according to claim 1, feature exist In: neural network model is used when constructing Malware disaggregated model in step 3, one side deep neural network model is suitble to locate Manage the input of high dimensional data;On the other hand, deep learning can Enhanced feature study, building model process in APK can be mentioned 1067 dimensional features that take carry out corresponding combined transformation, and profound connection between automatic mining feature, with adapt to constantly to evolve Malware realizes higher malware detection accuracy rate.
CN201810963774.2A 2018-08-23 2018-08-23 Android malicious software detection method based on deep learning Active CN109271788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810963774.2A CN109271788B (en) 2018-08-23 2018-08-23 Android malicious software detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810963774.2A CN109271788B (en) 2018-08-23 2018-08-23 Android malicious software detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN109271788A true CN109271788A (en) 2019-01-25
CN109271788B CN109271788B (en) 2021-10-12

Family

ID=65154347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810963774.2A Active CN109271788B (en) 2018-08-23 2018-08-23 Android malicious software detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN109271788B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069927A (en) * 2019-04-22 2019-07-30 中国民航大学 Malice APK detection method, system, data storage device and detection program
CN110245493A (en) * 2019-05-22 2019-09-17 中国人民公安大学 A method of the Android malware detection based on depth confidence network
CN110363003A (en) * 2019-07-25 2019-10-22 哈尔滨工业大学 A kind of Android virus static detection method based on deep learning
CN110717182A (en) * 2019-10-14 2020-01-21 杭州安恒信息技术股份有限公司 Webpage Trojan horse detection method, device and equipment and readable storage medium
CN111460452A (en) * 2020-03-30 2020-07-28 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN112861135A (en) * 2021-04-12 2021-05-28 中南大学 Malicious code detection method based on attention mechanism
CN112966272A (en) * 2021-03-31 2021-06-15 国网河南省电力公司电力科学研究院 Internet of things Android malicious software detection method based on countermeasure network
CN113139189A (en) * 2021-04-29 2021-07-20 广州大学 Method, system and storage medium for identifying mining malicious software
CN113656308A (en) * 2021-08-18 2021-11-16 福建卫联科技有限公司 Computer software analysis system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
CN102938040A (en) * 2012-09-29 2013-02-20 中兴通讯股份有限公司 Malicious Android application program detection method, system and device
US8826439B1 (en) * 2011-01-26 2014-09-02 Symantec Corporation Encoding machine code instructions for static feature based malware clustering
CN104376262A (en) * 2014-12-08 2015-02-25 中国科学院深圳先进技术研究院 Android malware detecting method based on Dalvik command and authority combination
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN106096405A (en) * 2016-04-26 2016-11-09 浙江工业大学 A kind of Android malicious code detecting method abstract based on Dalvik instruction
CN107169354A (en) * 2017-04-21 2017-09-15 北京理工大学 Multi-layer android system malicious act monitoring method
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection
CN108304720A (en) * 2018-02-06 2018-07-20 恒安嘉新(北京)科技股份公司 A kind of Android malware detection methods based on machine learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
US8826439B1 (en) * 2011-01-26 2014-09-02 Symantec Corporation Encoding machine code instructions for static feature based malware clustering
CN102938040A (en) * 2012-09-29 2013-02-20 中兴通讯股份有限公司 Malicious Android application program detection method, system and device
CN104376262A (en) * 2014-12-08 2015-02-25 中国科学院深圳先进技术研究院 Android malware detecting method based on Dalvik command and authority combination
CN105205396A (en) * 2015-10-15 2015-12-30 上海交通大学 Detecting system for Android malicious code based on deep learning and method thereof
CN106096405A (en) * 2016-04-26 2016-11-09 浙江工业大学 A kind of Android malicious code detecting method abstract based on Dalvik instruction
CN107169354A (en) * 2017-04-21 2017-09-15 北京理工大学 Multi-layer android system malicious act monitoring method
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection
CN108304720A (en) * 2018-02-06 2018-07-20 恒安嘉新(北京)科技股份公司 A kind of Android malware detection methods based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FAUZIA IDREES 等: "Investigating the android intents and permissions for malware detection", 《2014 SEVENTH INTERNATIONAL WORKSHOP ON SELECTED TOPICS IN MOBILE AND WIRELESS COMPUTING》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069927A (en) * 2019-04-22 2019-07-30 中国民航大学 Malice APK detection method, system, data storage device and detection program
CN110245493A (en) * 2019-05-22 2019-09-17 中国人民公安大学 A method of the Android malware detection based on depth confidence network
CN110363003A (en) * 2019-07-25 2019-10-22 哈尔滨工业大学 A kind of Android virus static detection method based on deep learning
CN110363003B (en) * 2019-07-25 2022-08-02 哈尔滨工业大学 Android virus static detection method based on deep learning
CN110717182A (en) * 2019-10-14 2020-01-21 杭州安恒信息技术股份有限公司 Webpage Trojan horse detection method, device and equipment and readable storage medium
CN111460452A (en) * 2020-03-30 2020-07-28 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN111460452B (en) * 2020-03-30 2022-09-09 中国人民解放军国防科技大学 Android malicious software detection method based on frequency fingerprint extraction
CN112966272A (en) * 2021-03-31 2021-06-15 国网河南省电力公司电力科学研究院 Internet of things Android malicious software detection method based on countermeasure network
CN112966272B (en) * 2021-03-31 2022-09-09 国网河南省电力公司电力科学研究院 Internet of things Android malicious software detection method based on countermeasure network
CN112861135A (en) * 2021-04-12 2021-05-28 中南大学 Malicious code detection method based on attention mechanism
CN113139189A (en) * 2021-04-29 2021-07-20 广州大学 Method, system and storage medium for identifying mining malicious software
CN113656308A (en) * 2021-08-18 2021-11-16 福建卫联科技有限公司 Computer software analysis system

Also Published As

Publication number Publication date
CN109271788B (en) 2021-10-12

Similar Documents

Publication Publication Date Title
CN109271788A (en) A kind of Android malware detection method based on deep learning
CN108304720B (en) Android malicious program detection method based on machine learning
CN109753801B (en) Intelligent terminal malicious software dynamic detection method based on system call
Li et al. Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection
US8838992B1 (en) Identification of normal scripts in computer systems
US10621349B2 (en) Detection of malware using feature hashing
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN107659570A (en) Webshell detection methods and system based on machine learning and static and dynamic analysis
CN105229661B (en) Method, computing device and the storage medium for determining Malware are marked based on signal
CN107688743B (en) Malicious program detection and analysis method and system
KR20180080449A (en) Method and apparatus for recognizing cyber threats using correlational analytics
CN103473506A (en) Method and device of recognizing malicious APK files
CN109992968A (en) Android malicious act dynamic testing method based on binary system dynamic pitching pile
CN109598124A (en) A kind of webshell detection method and device
CN106611122A (en) Virtual execution-based unknown malicious program offline detection system
CN106845220B (en) Android malicious software detection system and method
CN107315956A (en) A kind of Graph-theoretical Approach for being used to quick and precisely detect Malware on the zero
Zhu et al. Android malware detection based on multi-head squeeze-and-excitation residual network
CN109740040B (en) Verification code identification method, device, storage medium and computer equipment
CN109858248A (en) Malice Word document detection method and device
CN109614795A (en) A kind of Android malware detection method of event perception
CN106874760A (en) A kind of Android malicious code sorting techniques based on hierarchy type SimHash
CN109711163A (en) Android malware detection method based on API Calls sequence
Niu et al. Detecting malware on X86-based IoT devices in autonomous driving
CN110704841A (en) Convolutional neural network-based large-scale android malicious application detection system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant