CN109271788A - A kind of Android malware detection method based on deep learning - Google Patents
A kind of Android malware detection method based on deep learning Download PDFInfo
- Publication number
- CN109271788A CN109271788A CN201810963774.2A CN201810963774A CN109271788A CN 109271788 A CN109271788 A CN 109271788A CN 201810963774 A CN201810963774 A CN 201810963774A CN 109271788 A CN109271788 A CN 109271788A
- Authority
- CN
- China
- Prior art keywords
- feature
- file
- android
- malware
- apk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The Android malware detection method based on deep learning that the present invention relates to a kind of, belongs to computer and information science technical field.The present invention carries out feature extraction to Android application software first, then by unziping it to Android application file and relevant security feature is extracted in the operations such as decompiling.The feature of extraction includes 3 aspects: the N-gram statistical nature that file structure feature, safety experience feature and Dalvik instruction set are constituted.Then numeralization processing, construction feature vector are carried out to the feature of extraction.Finally the correlated characteristic based on said extracted constructs DNN (Deep Neural Network) model.New Android software is classified and identified by the model of building.This method has merged the analysis of instruction set, have the function of that fighting Malware obscures, simultaneously the malware detection based on depth model can Enhanced feature study, the abundant internal information of big data can be expressed well, be more easier the Malware for adapting to constantly evolve.
Description
Technical field
The Android malware detection method based on deep learning that the present invention relates to a kind of, belongs to computer and information
Science and technology field.
Background technique
With the continuous development of mobile Internet, intelligent terminal becomes important component in everyone life.
Android causes Malware general as most popular Mobile operating system, since it is opened with flexible ecological environment
Excessively.It is the research topic with important value that Android malware, which how is effectively detected,.Current main-stream
Android malicious code detecting method is roughly divided into static detection method and dynamic testing method.
1. dynamic testing method
So-called dynamic detection and analysis, refer to that extraction feature is detected and analyzed again after allowing detected program to behave
Method.Dynamic testing method is mainly by operating in Android application file in Android device, then by adopting
API Calls sequence, resource in collection software running process such as use to realize the analysis to software at the data.Although dynamic analysis have
Not by code shell adding, the advantages of limiting factors are influenced such as obscure, but this method has data in actual use that acquisition is extracted
Difficulty, software run cost is big, code coverage is low, is easy reverse-examination survey the etc. and detection to running environment by Malware
Problem.Therefore, detected by the method for dynamic analysis Malware use in practice it is less.
2. static detection method
The method of static detection mainly by the way that Android application file is scanned and is analyzed, extracts Android
In file and safety-related sensitive information and feature, such as sensitive permission, system acting, sensory system are called etc..Then needle
These features of refinement are analyzed and concluded and judge whether it is Malware.It is compared with dynamic analysing method, it is static
The method of analysis code coverage with higher and lesser time overhead, can be normally reached preferable Detection accuracy.
This method is also the detection method of current various checking and killing virus software mainstreams.But in the actual environment, Android application is opened
Originator the operation such as obscures and encrypts to be protected often to will do it to code, and static analysis is just not easy to mention under this environment
Validity feature is got to judge by accident to it.Meanwhile Malware is every year all in evolution and development at full speed, conventional detection
Method is difficult to adapt to the new Malware continued to bring out.
For above-mentioned problem, this project proposes a kind of Malware classification method based on deep learning.On the one hand,
By analyzing Android application file, the common static nature of some Malwares has been extracted.On the other hand, lead to
It crosses and Smalli source code is extracted to the progress decompiling of Android application file, then extract Dalvik from Smalli source code
Then operation code is abstracted its instruction set and extracts N-gram sequence signature.Finally by the feature normalizing of said extracted
The identification that abstract modeling completes Malware is carried out by deep learning algorithm after change processing.The detection system of set analysis based on instruction
System has the function of that fighting Malware obscures.Malware detection based on depth model can Enhanced feature study, to big
The abundant internal information of data is able to carry out good expression, is more easier the Malware for adapting to constantly evolve.
Summary of the invention
Present invention aim to address conventional Android malware detection method Detection accuracies, and low, detection is applicable in model
The problem of being with limit and being difficult to adapt to emerging software, proposes a kind of malware detection method based on deep learning.
Design principle of the invention are as follows: feature extraction is carried out to Android application software first.Then by pair
Android application file, which is unziped it, extracts relevant security feature with operations such as decompilings.The feature of extraction includes 3 sides
Face: the N-gram statistical nature that file structure feature, safety experience feature and Dalvik instruction set are constituted.Then to the spy of extraction
Sign carries out numeralization processing construction feature vector.Finally the correlated characteristic based on said extracted constructs DNN (Deep Neural
Network) model.Software classification and identification are carried out to new Android by the model of building.
The technical scheme is that be achieved by the steps of:
Step 1, the positive and negative sample file of Android is obtained, then file is pre-processed.
Step 1.1, malice Android software library is obtained from http://amd.arguslab.org/behaviors to be total to
24552, normal software is obtained in the market from Android for 21000.
Step 1.2, operation is unziped it to each application software, extracts Android application software
The files such as AndroidManifest.xml file, res file, classs.dex file are for subsequent analysis.
Step 1.3, decompiling operation is carried out to class.dex file by Andguard tool, then extracted
Dalvik operation code.
Step 2, feature extraction is carried out to Android application file.
Step 2.1, the file obtained with regard to step 1 carries out feature extraction, and the feature of extraction includes file structure feature, safety
N-gram feature after empirical features and Dalvik instruction set are abstract.
Step 2.2, then quantize to the feature extracted, normalization obtained after indicating the feature of each application to
Amount.
Step 3, it constructs neural network classifier and recognition detection is carried out to software.
Step 3.1, according to the feature vector partition testing collection and data set of database and training neural network.
Step 3.2, pretreatment and feature extraction are carried out to new software, is then based on the neural network classifier of building
Classify, provides detection of the software classification result completion to software.
Beneficial effect
Compared to traditional static analysis method, the present invention is extracted software features more abundant, including file structure
Feature, empirical features and the N-gram statistical nature based on Dalvik instruction set.These features can be to Android software characteristic
It is characterized more fully hereinafter, to reach higher Detection accuracy.
Compared to traditional machine learning classification algorithm, deep learning can Enhanced feature study, big data is enriched
Internal information is expressed well, is more easier the Malware for adapting to constantly evolve.
Detailed description of the invention
Fig. 1 is the Android malware detection method schematic diagram of the invention based on deep learning.
Specific embodiment
Objects and advantages in order to better illustrate the present invention are done below with reference to embodiment of the example to the method for the present invention
It is further described.
Detailed process are as follows:
Step 1, the positive and negative sample file of Android is obtained, then file is pre-processed
Step 1.1, malice Android software library is obtained from http://amd.arguslab.org/behaviors to be total to
24552, normal software is obtained in the market from Android for 21000.
Step 1.2, Android application software is extracted by Andguard tool to each application software
The files such as AndroidManifest.xml file, res file, classs.dex file are for subsequent analysis.
Step 1.3, decompiling operation is carried out to class.dex file by Andguard tool, then extracted each
The Dalvik operation code of Smalli file.
Step 2, feature extraction is carried out to Android application file.
Step 2.1, the file obtained with regard to step 1 carries out feature extraction.The first kind is structured features, including APK application
Sensitive permission, using comprising system acting, using comprising activity, service, Broadcast Receive,
Content Provider quantity etc. 63 is tieed up.Second is empirical features, it mainly includes the warp that long-term malice APK is tested and analyzed
Test the feature of summary, including whether include executable file in resource file, in assets file whether comprising APK file,
The number etc. of function of the image file number and parameter for including in resource file in APK file greater than 20.General installation file
In keep malicious code in executable file under the APK maximum probability comprising additional executable file.Malware has less
Image file number and in order to hide detection have malice be inclined to power function possess more parameter.Empirical features are total
4 dimensions.Third is the N-gram feature after Dalvik instruction set is abstract.Malware is recognized according to the analysis to Malware
Realize that the code of malicious intent can all concentrate in a malicious file, when counting N-gram feature as unit of single file,
Then the N-gram feature of statistics is weighted, as final feature vector.Dalvik is instructed according to function spy first
Property be divided into 10 classes, specific situation of classifying is as shown in table 1.Then each Smalli file statistics Dalvik in APK file is referred to
Enable the symbol sebolic addressing after being abstracted.Then N-gram processing is done to the sequence, that choose here is 3-gram.A such as APK text
Part has n Smalli file, each file can count 1000 dimension statistical natures and be denoted as Fn, concrete form such as formula (1)
It is shown, wherein fnkIndicate n-th of file, k-th of characteristic statistics quantitative value.
Fn=[fn0,fn1,fn2……fn999] (1)
Then then 1000 new dimensional feature F are can be obtained into as most after normalization indicates in n 1000 dimensional features weightings
Whole Dalvik bytecode N-gram statistical nature, shown in specific form such as formula (2).
F=[k0, k1……km……k999] (2)
In formula, kmIt indicates in new statistical nature, m-th of characteristic value.
The definition of 1 command character meaning of table
Step 2.2, it quantizes to the feature that step 2.1 is extracted, normalization obtains the spy of each application after indicating
Levy vector.Assuming that database D, then the database can be indicated with following matrix.
Wherein, database D shares n sample, and the attribute of each sample shares p dimension, and each sample has target value Y.This
Place, target value value 0 or 1,1 are expressed as positive sample, and 0 is expressed as negative sample.The characteristic dimension of each sample is in the method
1067, n 45120.
Step 3, it constructs neural network classifier and recognition detection is carried out to software.
Step 3.1, according to the feature vector partition testing collection and training set of database and training neural network.
Step 3.2, pretreatment and feature extraction are carried out to new software, is then based on the neural network classifier of building
Classify, provides detection of the software classification result completion to software.
Test result: amount in experiment and choose positive negative sample 45120 (partial document damage can not extract), wherein malice
23511, sample, normal software 21609.The neural network model for being then based on building carries out 5 foldings on entire data set
Cross validation.Average Accuracy, average recall rate and the average F value for finally measuring positive negative sample are as shown in table 2.From table
Experimental data can be seen that this method Detection accuracy with higher can reach preferable detection effect.
2 test experiments result of table
Above-described specific descriptions have carried out further specifically the purpose of invention, technical scheme and beneficial effects
It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention
It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention
Protection scope within.
Claims (4)
1. a kind of Android malware detection method based on deep learning, it is characterised in that the method includes walking as follows
It is rapid:
Step 1, the positive and negative sample file of Android is obtained, then file is pre-processed, comprising: choose positive and negative Android and answer
With software, and processing is unziped it to APK file and obtains file all in file in APK;Then to class.dex file
Decompiling processing is carried out, Dalvik operation code in every portion smalli file is extracted in each APK;
Step 2, the file obtained with regard to step 1 carries out feature extraction and obtains each software features vector, comprising: to step 1
The N-gram statistics that obtained file is handled to obtain the empirical features of APK file, structure feature and Dalvik instruction set is special
Sign, and the feature vector of each software is obtained after features described above is carried out numeralization and normalized;
Step 3, disaggregated model is constructed according to the data that step 2 is extracted, is tested on data set using 5 foldings intersection in building process
Card method assesses model, finally carries out recognition detection based on building neural network classifier and to software.
2. a kind of Android malware detection method based on deep learning according to claim 1, feature exist
In: when extracting the feature of Android application in step 2, the feature of extraction includes structure feature, empirical features and Dalvik instruction
The abstract N-gram feature of collection;Structured features, the sensitive permission including APK application, using comprising system acting, using packet
Activity, service, Broadcast Receive, Content Provider quantity for containing etc. 63 is tieed up;Empirical features, it
Mainly include the feature for the summary of experience that long-term malice APK is tested and analyzed, including whether include executable file in resource file,
Whether it is greater than comprising the image file number for including in resource file in APK file, APK file and parameter in assets file
Number of 20 function etc. 4 is tieed up;3-gram statistical nature 1000 after Dalvik instruction set is abstract is tieed up.
3. a kind of Android malware detection method based on deep learning according to claim 1, feature exist
In: in view of Malware realizes that the function code of malicious intent can all concentrate on one when counting N-gram feature in step 2
In malicious file, therefore as unit of when counting N-gram feature by single Smalli file, then each file is counted
N-gram feature be weighted with after normalized as final feature vector.
4. a kind of Android malware detection method based on deep learning according to claim 1, feature exist
In: neural network model is used when constructing Malware disaggregated model in step 3, one side deep neural network model is suitble to locate
Manage the input of high dimensional data;On the other hand, deep learning can Enhanced feature study, building model process in APK can be mentioned
1067 dimensional features that take carry out corresponding combined transformation, and profound connection between automatic mining feature, with adapt to constantly to evolve
Malware realizes higher malware detection accuracy rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810963774.2A CN109271788B (en) | 2018-08-23 | 2018-08-23 | Android malicious software detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810963774.2A CN109271788B (en) | 2018-08-23 | 2018-08-23 | Android malicious software detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109271788A true CN109271788A (en) | 2019-01-25 |
CN109271788B CN109271788B (en) | 2021-10-12 |
Family
ID=65154347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810963774.2A Active CN109271788B (en) | 2018-08-23 | 2018-08-23 | Android malicious software detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271788B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069927A (en) * | 2019-04-22 | 2019-07-30 | 中国民航大学 | Malice APK detection method, system, data storage device and detection program |
CN110245493A (en) * | 2019-05-22 | 2019-09-17 | 中国人民公安大学 | A method of the Android malware detection based on depth confidence network |
CN110363003A (en) * | 2019-07-25 | 2019-10-22 | 哈尔滨工业大学 | A kind of Android virus static detection method based on deep learning |
CN110717182A (en) * | 2019-10-14 | 2020-01-21 | 杭州安恒信息技术股份有限公司 | Webpage Trojan horse detection method, device and equipment and readable storage medium |
CN111460452A (en) * | 2020-03-30 | 2020-07-28 | 中国人民解放军国防科技大学 | Android malicious software detection method based on frequency fingerprint extraction |
CN112861135A (en) * | 2021-04-12 | 2021-05-28 | 中南大学 | Malicious code detection method based on attention mechanism |
CN112966272A (en) * | 2021-03-31 | 2021-06-15 | 国网河南省电力公司电力科学研究院 | Internet of things Android malicious software detection method based on countermeasure network |
CN113139189A (en) * | 2021-04-29 | 2021-07-20 | 广州大学 | Method, system and storage medium for identifying mining malicious software |
CN113656308A (en) * | 2021-08-18 | 2021-11-16 | 福建卫联科技有限公司 | Computer software analysis system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1924866A (en) * | 2006-09-28 | 2007-03-07 | 北京理工大学 | Static feature based web page malicious scenarios detection method |
CN102938040A (en) * | 2012-09-29 | 2013-02-20 | 中兴通讯股份有限公司 | Malicious Android application program detection method, system and device |
US8826439B1 (en) * | 2011-01-26 | 2014-09-02 | Symantec Corporation | Encoding machine code instructions for static feature based malware clustering |
CN104376262A (en) * | 2014-12-08 | 2015-02-25 | 中国科学院深圳先进技术研究院 | Android malware detecting method based on Dalvik command and authority combination |
CN105205396A (en) * | 2015-10-15 | 2015-12-30 | 上海交通大学 | Detecting system for Android malicious code based on deep learning and method thereof |
CN106096405A (en) * | 2016-04-26 | 2016-11-09 | 浙江工业大学 | A kind of Android malicious code detecting method abstract based on Dalvik instruction |
CN107169354A (en) * | 2017-04-21 | 2017-09-15 | 北京理工大学 | Multi-layer android system malicious act monitoring method |
CN107577942A (en) * | 2017-08-22 | 2018-01-12 | 中国民航大学 | A kind of composite character screening technique for Android malware detection |
CN108304720A (en) * | 2018-02-06 | 2018-07-20 | 恒安嘉新(北京)科技股份公司 | A kind of Android malware detection methods based on machine learning |
-
2018
- 2018-08-23 CN CN201810963774.2A patent/CN109271788B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1924866A (en) * | 2006-09-28 | 2007-03-07 | 北京理工大学 | Static feature based web page malicious scenarios detection method |
US8826439B1 (en) * | 2011-01-26 | 2014-09-02 | Symantec Corporation | Encoding machine code instructions for static feature based malware clustering |
CN102938040A (en) * | 2012-09-29 | 2013-02-20 | 中兴通讯股份有限公司 | Malicious Android application program detection method, system and device |
CN104376262A (en) * | 2014-12-08 | 2015-02-25 | 中国科学院深圳先进技术研究院 | Android malware detecting method based on Dalvik command and authority combination |
CN105205396A (en) * | 2015-10-15 | 2015-12-30 | 上海交通大学 | Detecting system for Android malicious code based on deep learning and method thereof |
CN106096405A (en) * | 2016-04-26 | 2016-11-09 | 浙江工业大学 | A kind of Android malicious code detecting method abstract based on Dalvik instruction |
CN107169354A (en) * | 2017-04-21 | 2017-09-15 | 北京理工大学 | Multi-layer android system malicious act monitoring method |
CN107577942A (en) * | 2017-08-22 | 2018-01-12 | 中国民航大学 | A kind of composite character screening technique for Android malware detection |
CN108304720A (en) * | 2018-02-06 | 2018-07-20 | 恒安嘉新(北京)科技股份公司 | A kind of Android malware detection methods based on machine learning |
Non-Patent Citations (1)
Title |
---|
FAUZIA IDREES 等: "Investigating the android intents and permissions for malware detection", 《2014 SEVENTH INTERNATIONAL WORKSHOP ON SELECTED TOPICS IN MOBILE AND WIRELESS COMPUTING》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069927A (en) * | 2019-04-22 | 2019-07-30 | 中国民航大学 | Malice APK detection method, system, data storage device and detection program |
CN110245493A (en) * | 2019-05-22 | 2019-09-17 | 中国人民公安大学 | A method of the Android malware detection based on depth confidence network |
CN110363003A (en) * | 2019-07-25 | 2019-10-22 | 哈尔滨工业大学 | A kind of Android virus static detection method based on deep learning |
CN110363003B (en) * | 2019-07-25 | 2022-08-02 | 哈尔滨工业大学 | Android virus static detection method based on deep learning |
CN110717182A (en) * | 2019-10-14 | 2020-01-21 | 杭州安恒信息技术股份有限公司 | Webpage Trojan horse detection method, device and equipment and readable storage medium |
CN111460452A (en) * | 2020-03-30 | 2020-07-28 | 中国人民解放军国防科技大学 | Android malicious software detection method based on frequency fingerprint extraction |
CN111460452B (en) * | 2020-03-30 | 2022-09-09 | 中国人民解放军国防科技大学 | Android malicious software detection method based on frequency fingerprint extraction |
CN112966272A (en) * | 2021-03-31 | 2021-06-15 | 国网河南省电力公司电力科学研究院 | Internet of things Android malicious software detection method based on countermeasure network |
CN112966272B (en) * | 2021-03-31 | 2022-09-09 | 国网河南省电力公司电力科学研究院 | Internet of things Android malicious software detection method based on countermeasure network |
CN112861135A (en) * | 2021-04-12 | 2021-05-28 | 中南大学 | Malicious code detection method based on attention mechanism |
CN113139189A (en) * | 2021-04-29 | 2021-07-20 | 广州大学 | Method, system and storage medium for identifying mining malicious software |
CN113656308A (en) * | 2021-08-18 | 2021-11-16 | 福建卫联科技有限公司 | Computer software analysis system |
Also Published As
Publication number | Publication date |
---|---|
CN109271788B (en) | 2021-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271788A (en) | A kind of Android malware detection method based on deep learning | |
CN108304720B (en) | Android malicious program detection method based on machine learning | |
CN109753801B (en) | Intelligent terminal malicious software dynamic detection method based on system call | |
Li et al. | Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection | |
US8838992B1 (en) | Identification of normal scripts in computer systems | |
US10621349B2 (en) | Detection of malware using feature hashing | |
CN111639337B (en) | Unknown malicious code detection method and system for massive Windows software | |
CN107659570A (en) | Webshell detection methods and system based on machine learning and static and dynamic analysis | |
CN105229661B (en) | Method, computing device and the storage medium for determining Malware are marked based on signal | |
CN107688743B (en) | Malicious program detection and analysis method and system | |
KR20180080449A (en) | Method and apparatus for recognizing cyber threats using correlational analytics | |
CN103473506A (en) | Method and device of recognizing malicious APK files | |
CN109992968A (en) | Android malicious act dynamic testing method based on binary system dynamic pitching pile | |
CN109598124A (en) | A kind of webshell detection method and device | |
CN106611122A (en) | Virtual execution-based unknown malicious program offline detection system | |
CN106845220B (en) | Android malicious software detection system and method | |
CN107315956A (en) | A kind of Graph-theoretical Approach for being used to quick and precisely detect Malware on the zero | |
Zhu et al. | Android malware detection based on multi-head squeeze-and-excitation residual network | |
CN109740040B (en) | Verification code identification method, device, storage medium and computer equipment | |
CN109858248A (en) | Malice Word document detection method and device | |
CN109614795A (en) | A kind of Android malware detection method of event perception | |
CN106874760A (en) | A kind of Android malicious code sorting techniques based on hierarchy type SimHash | |
CN109711163A (en) | Android malware detection method based on API Calls sequence | |
Niu et al. | Detecting malware on X86-based IoT devices in autonomous driving | |
CN110704841A (en) | Convolutional neural network-based large-scale android malicious application detection system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |