CN110245493A - A method of the Android malware detection based on depth confidence network - Google Patents

A method of the Android malware detection based on depth confidence network Download PDF

Info

Publication number
CN110245493A
CN110245493A CN201910431019.4A CN201910431019A CN110245493A CN 110245493 A CN110245493 A CN 110245493A CN 201910431019 A CN201910431019 A CN 201910431019A CN 110245493 A CN110245493 A CN 110245493A
Authority
CN
China
Prior art keywords
layer
feature
android
learning model
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910431019.4A
Other languages
Chinese (zh)
Inventor
芦天亮
李国友
杜彦辉
欧阳立
吴警
张翼翔
暴雨轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Cryptography Administration Commercial Code Testing Center
CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY
Original Assignee
State Cryptography Administration Commercial Code Testing Center
CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Cryptography Administration Commercial Code Testing Center, CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY filed Critical State Cryptography Administration Commercial Code Testing Center
Priority to CN201910431019.4A priority Critical patent/CN110245493A/en
Publication of CN110245493A publication Critical patent/CN110245493A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Virology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

A kind of method that the application proposes Android malware detection based on depth confidence network, firstly, extracting the permission of Android application software and the feature of sensitive API;Secondly, constructing deep learning model using depth confidence network DBN, the feature extracted is handled using deep learning model, obtains the sample of characterization higher level of abstraction feature;Then sorting algorithm is used, classifies to the higher level of abstraction feature exported by deep learning model, distinguishes Malware and normal software.Deep learning model through the invention based on depth confidence network can preferably characterize the higher level of abstraction feature of Android malware, and detection effect is also significantly better than traditional neural network model and machine learning model.

Description

A method of the Android malware detection based on depth confidence network
Technical field
The present invention relates to network safety filed more particularly to a kind of Android malwares based on depth confidence network The method of detection.
Background technique
Android operation system is a kind of operating system based on Linux, by Google company and open mobile phone alliance neck It leads and develops.Compared to the operating system on other intelligent terminals, there is complete open source property, and Android application market Complicated multiplicity, so that the quantity rapid growth of Android malware.Many Android malwares can induce user installation, And a large amount of new malicious applications are downloaded, mobile data traffic is consumed, short message of deducting fees is sent, causes serious security threat.Wherein, also Some normal Android application software obtain relevant information by applying for excessive improperly permission, realize that it collects user The purpose of privacy.It is become more and more important as it can be seen that carrying out detection to Android malware.
Currently, the detection technique of Android malware is broadly divided into static detection and dynamic detection.(1) static detection Refer in the case where not executing application software judge whether contain malicious code in application software.Android malware is quiet Statically detection Android application software is generally realized in state detection by dis-assembling.Enck et al. passes through dis-assembling Android Application software analyzes its source code to find code vulnerabilities.Yang et al. proposes AppContext static detection frame, AppContext classifies to application program according to the context of triggering security sensitive behavior.Static detection passes through soft to application Part carries out the methods of decompiling, and the static nature of rapidly extracting application software is simultaneously detected, the disadvantage is that the extension of detection pattern Property is poor.(2) dynamic detection refers to the overall monitor application behavior when Android application software executes.Dynamic detection technology passes through Application software is run under sandbox or true environment to obtain information to be detected.DroidScope can be protected Running environment under dynamic detection application software.Dini et al. proposes dynamic detection frame MADAM, can be in Android kernel Layer and client layer monitor application software.Dynamic detection accuracy rate is higher, the disadvantage is that occupying when operation, resource is more, and efficiency is lower.
For static detection and dynamic detection, it usually needs artificially generate and update Android malware inspection Then, this detect the emerging Malware in part can not effectively to gauge.In order to accurately identify unknown malware, machine Study starts to be applied to Android malware test problems.DroidAPIMiner analyzes API by machine learning algorithm The Android application feature of rank.Zhao et al. proposes the feature selecting algorithm based on characteristic frequency.In traditional engineering It practises in algorithm, support vector machines (Support Vector Machine, SVM) algorithm is usually used in based on feature selecting Android malware detection.Since traditional machine learning algorithm is usually all shallow-layer framework, it can not effectively pass through association Feature carries out high-level characterization to Android software.
As it can be seen that the main problem of Android malware detection is in the prior art:
(1) scalability of the Android malware detection pattern based on static detection is poor, and more and more Android malware is by beating again the modes such as packet around static detection.
(2) occupancy resource is more when the Android malware based on dynamic detection detects operation, and efficiency is lower, and It is difficult to detect by the Android application software never occurred.
(3) the Android malware detection based on conventional machines learning algorithm, machine learning structure is mostly shallow-layer knot Structure, can not carry out the character representation of higher level of abstraction to Malware, and detection effect is not fully up to expectations.
Therefore, the present invention attempts to carry out Android malware detection by deep learning model to solve the prior art Present in above-mentioned technical problem.
Summary of the invention
The present invention provides a kind of method of Android malware detection based on depth confidence network, existing to solve There are many defects existing for malware detection method in technology.
The present invention provides a kind of method of Android malware detection based on depth confidence network, it is characterised in that Described method includes following steps:
Extract the permission of Android application software and the feature of sensitive API;
Deep learning model is constructed using depth confidence network DBN, the feature extracted is used into the depth Learning model is handled, and the sample of characterization higher level of abstraction feature is obtained;
Using sorting algorithm, classify to the sample gone out by the deep learning model inspection, it is soft to distinguish malice Part and normal software.
The feature of permission and sensitive API that Android application software is extracted described in method proposed by the present invention is specific Refer to:
The application software installation file is decompressed, AndroidManifest.xml and classes.dex text is obtained Part obtains the permission of Android application software and the feature of sensitive API by the document analysis obtained to the decompression.
The building for constructing deep learning model using depth confidence network DBN in method of the invention includes by unsupervised The pre-training stage and be made of two stages of back-propagating stage of supervision.
Pre-training phase process described in method of the invention is as follows:
The feature vector x in training set for being N for sample sizen(0≤n < N)
1) n=0
2) by xnIt is transmitted to visual layers V0, hidden layer H is calculated according to formula (1)0:
P(h0j=1 | V0)=σ (WjV0) --- formula (1)
In above-mentioned formula, P is probability-distribution function, is the core of trained weight in CD algorithm;
hijIndicate the value of j-th of hidden unit in i-th layer of hidden layer;
ViIndicate i-th layer of RBM visual layers vector, H1Indicate i-th layer of RBM hidden layer vector;
WiIndicate the visual layers of i-th layer of RBM and the weight vector of hidden layer mapping relations;
σ calculation formula is as follows:
σ (x)=1/ (1+exp (- x))
3) visual layers are calculated according to formula (2) and obtains V1:
In above formula, vijIndicate the value of j-th of visual element in i-th layer of visual layers, the transposition of superscript T representing matrix;
4) hidden layer H is calculated further according to formula (1)1:
P(hij=1 | V1)=σ (WjV1)
5) for all node j, weight is updated:
λ is related with convergence rate, is a constant, λ is bigger, and convergence rate is faster
If n=N-1, terminate, otherwise 2) n=n+1, goes to step.
Back-propagating described in method of the invention comprises the following processes:
1) parameter of random initializtion BP network, reads the weight matrix W of RBM network, and training pace is initialized as N;
2) each layer of forward calculation of unit-node value, to l layers of j unit-node, nodal value isThe nodal value is all related to l-1 layers of all cell nodes, if neuron j exists Output layer (l=L) enablesError ej(n)=dj(n)-oj(n), djFor the result of label;
WijIt (n) is weight related with l j-th of unit-node of layer;
For the nodal value of l layers of j-th of unit-node, if it is output layer neuron, ojIt (n) is output layer J unit-node nodal value, djIt (n) is the correct result of output layer label, ej(n)=dj(n)-ojIt (n) is error;
3) δ is calculated, to transmitting, will successively finely tune weight downwards after δ;δ is function related with the error amount of nodal value, is used In fine tuning weight;
For output unit:
4) for hidden unit:
5) weight is finely tuned
Wherein η is learning rate, and learning rate is related with convergence rate, and learning rate is higher, is restrained faster;
If n=N, terminate;Otherwise 2) n=n+1 is gone to step.
Present invention combination Android application features are carried out using the deep learning model based on depth confidence network Signature analysis: comprehensive application features are obtained in conjunction with Android application software first;Then depth confidence network is utilized The higher level of abstraction feature for excavating software features, distinguishes normal software and evil based on higher level of abstraction feature finally by sorting algorithm Meaning software.The experimental results showed that the deep learning model based on depth confidence network can preferably characterize Android malice it is soft The feature of part, detection effect are also significantly better than traditional neural network model and machine learning model.
Detailed description of the invention
It, below will be to required in the embodiment of the present invention for the clearer technical solution for illustrating the embodiment of the present invention The attached drawing used is briefly described.
Fig. 1 is Android malware detection framework figure of the present invention.
Fig. 2 is the DNB network structure that the present invention uses.
Fig. 3 is the structure figures of deep learning model of the present invention.
Fig. 4 is the schematic diagram of sorting algorithm of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Whole description.
The Android evil based on depth confidence network (Deep Belief Network, DBN) that the invention proposes a kind of Anticipate software detection side, and Android malware detection framework of the invention is as shown in Figure 1, its step are as follows:
(1) firstly, being directed to Android application software, permission and sensitive API of Android application software etc. 179 is extracted A feature, 179 features correspond to the binary set of one 179 dimension, should if Android application software includes this feature Dimension is 1, and otherwise, which is original of the feature vector as deep learning model corresponding to 0, Android application software Begin to input;Secondly, deep learning model is constructed using DBN network, feature vector corresponding to Android application software is defeated Enter deep learning model, carries out the detection of Android application software.
(2) deep learning model
Since traditional machine learning algorithm is usually all shallow-layer framework, it can not effectively pass through linked character pair Android software carries out high-level characterization, and the present invention excavates higher level of abstraction feature using deep learning model, carries out The detection of Android application software.Feature vector corresponding to Android application software is inputted into deep learning model, depth The main function for practising model is to carry out weight expression to feature vector, is simply exactly the power for assessing each dimension of feature vector Weight, and without classification, the feature vector of input is subjected to high layering, subsequent sorting algorithm is responsible for applying Android Software is classified, so that classification is more acurrate, and detection effect is more preferable.
Deep learning model is divided into greedy algorithm initialization, contrast divergence algorithm is instructed in advance based on depth confidence network Experienced and back-propagating network finely tunes three parts, and three above part will describe in detail subsequent referring to attached drawing.
(3) sorting algorithm
Classification of the present invention using support vector machines (Support Vector Machine, SVM) algorithm as model is calculated Method, the feature vector that deep learning model is exported input svm classifier module, classify to Android application software, distinguish Malware and normal software out.
In the present invention, Android can be described more fully hereinafter using permission category feature and sensitive API category feature and apply, Using the deep structure of DBN e-learning feature, Android malware preferably can be characterized and detected.
Describe the basic framework of Android malware of the present invention detection above in conjunction with Fig. 1, below in conjunction with Fig. 2, Each step of Android malware detection method of the present invention is described in further detail in Fig. 3 and Fig. 4.
1. feature extraction
The feature of Android application software is broadly divided into static nature and behavioral characteristics, and static nature, which refers to, not to be executed In the case where application software, using modes such as decompilings, the feature of software to be analyzed is extracted, mainly includes authority information and calling Sensitive API information;Behavioral characteristics refer to the feature of the reflection application software behavior obtained when Android application software executes, It is slow compared to behavioral characteristics extraction rate, the disadvantages of resource is more are occupied, system resource needed for static nature is small, and speed is fast, fits Large-scale feature extraction is closed, therefore feature is extracted using the method for static analysis herein, and be based on static nature construction feature Collection.
Android application features in order to obtain decompress its installation file (.apk file), obtain two weights The file wanted, respectively AndroidManifest.xml and classes.dex file.AndroidManifest.xml file is System list file defines the information such as permission, the component of application software, solves to AndroidManifest.xml file Analysis, obtains the permission of Android application software application, for example, android.permission.camera is Android application Software application uses camera permission.By parsing AndroidManifest.xml file, Android application software has been obtained 120 permissions in total.Decompiling parsing is carried out to classes.dex file by baksmali tool, which API can be learnt Interface is called, for example, chmod is the sensitive API for changing user right.By parsing classes.dex file, obtain 59 sensitive APIs in total.The sensitive API information of the authority information extracted and calling is as shown in table 1.
The explanation of feature and feature that table 1 constructs
Referring to table 1,179 features such as permission and sensitive API of Android application software are extracted, 179 features are corresponding One 179 dimension binary set, if Android application software include this feature, the dimension be 1, otherwise, the dimension values For being originally inputted as deep learning model of feature vector corresponding to 0, Android application software.
2 depth confidence networks
In current deep learning theory, depth confidence network (Deep Belief Network, DBN) be using compared with For extensive a kind of deep learning frame.Depth confidence network is divided into two parts, structure as shown in Fig. 2, floor portions by multilayer Limited Boltzmann machine (Restricted Boltzmann Machine, RBM) element stack forms, and top section is to have supervision Back-propagating (Back Propagation, BP) network layer, for finely tuning overall architecture.The present invention by DBN network application in In Android malware detection, compared to traditional deep learning frame (Recognition with Recurrent Neural Network, convolutional neural networks etc.), It is an advantage of the present invention that for Android application software feature vector pace of learning faster, performance is more preferable, thus this Invention detects Android malware using the deep learning frame based on DBN.
As shown in Fig. 2, V indicates the nodal value vector of visual layers, H indicates the nodal value vector of hidden layer, in the RBM of stacking In, in addition to the bottom and top, the hidden layer in each layer of RBM is the visual layers of another RBM above.W is Weight matrix, for indicating the mapping relations between visual layers and hidden layer.
Deep learning model based on DBN of the invention includes three parts.Firstly, at the beginning of carrying out RBM using greedy algorithm Beginningization is initialized for the parameter to weight matrix W.Greedy algorithm is used for the initialization of RBM weight matrix, the purpose is to In order to increase the efficiency of subsequent contrast's divergence algorithm (Contrastive Divergence, CD), because of the weight of completely random Matrix parameter efficiency for CD algorithm is too low, and it is excessively high to calculate cost.Then, the Android first step obtained is using soft The APP sample of the namely non-label of feature vector corresponding to part inputs to initial characteristics the vector V0, bottom RBM of bottom RBM Initial characteristics vector be bottom RBM visual layers nodal value vector, by sdpecific dispersion (Contrastive Divergence, CD) algorithm each layer of RBM of training, in upward unsupervised conversion process, from be specifically not easy to classify Feature vector is converted into the abstract assemblage characteristic vector for being easy to classify, by adjusting the weight matrix Wi in own layer, so that The mapping of this layer of feature vector reaches local optimum.Finally, DBN network is finely tuned with having supervision by BP network, to make parameter Reach global optimum, and exports the feature vector for being easy to classify and enter categorization module.The groundwork of above-mentioned DBN network is to instruct The weight for practicing feature vector indicates, does not classify to Android application software, therefore there is still a need for pass through sorting algorithm pair Android application software is classified, and is made here using support vector machines (Support Vector Machine, SVM) algorithm For the sorting algorithm of model, structure is as shown in Figure 3.
As shown in Figure 2, the building of deep learning model by unsupervised pre-training stage and has the back-propagating rank of supervision Two stage compositions of section.In the pre-training stage, several RBM layer stacks form the basic framework of DBN network, and two layers adjacent Greedy algorithm is used between RBM, the parameter of weight matrix W is initialized, by sdpecific dispersion (Contrastive Divergence, CD) trained each layer of the RBM of algorithm, further trains the weight matrix parameter of RBM.In the back-propagating stage, BP module is finely adjusted DBN network with the sample of label in a manner of having supervision, finally, the feature of depth confidence network output Vector enters svm classifier module, and svm classifier module classifies to sample according to this feature vector, and structure is as shown in Figure 3.
The algorithm in the depth confidence network is described in detail below.
2.1 contrast divergence algorithm
Due to contrast divergence algorithm (Contrastive Divergence, CD) precision height, calculating speed is fast, using CD Algorithm is practised, for training the parameter of weight matrix W, so that the mapping of this layer of feature vector reaches local optimum.CD algorithm utilizes " otherness " of two probability distribution carrys out iteration and updates weight, is finally reached convergence.
RBM network self-training process based on CD algorithm is as follows:
The feature vector x in training set for being N for sample sizen(0≤n < N)
1) n=0
2) by xnIt is transmitted to visual layers V0, hidden layer H is calculated according to formula (1)0:
P(h0j=1 | V0)=σ (WjV0) --- formula (1)
In above-mentioned formula, P is probability-distribution function, is the core of trained weight in CD algorithm;
hijIndicate the value of j-th of hidden unit in i-th layer of hidden layer;
ViIndicate i-th layer of RBM visual layers vector, H1Indicate i-th layer of RBM hidden layer vector;
WiIndicate the visual layers of i-th layer of RBM and the weight vector of hidden layer mapping relations;
σ calculation formula is as follows:
σ (x)=1/ (1+exp (- x))
3) visual layers are calculated according to formula (2) and obtains V1:
In above formula, vijIndicate the value of j-th of visual element in i-th layer of visual layers, the transposition of superscript T representing matrix;
4) hidden layer H is calculated further according to formula (1)1:
P(h1j=1 | V1)=σ (WjV1)
5) for all node j, weight is updated:
λ is related with convergence rate, is a constant, λ is bigger, and convergence rate is faster
If n=N-1, terminate, otherwise 2) n=n+1, goes to step.
2.2 back-propagating networks
As shown in figure 3, back-propagating (Back Propagation, BP) network is by there is the mode of learning of supervision, and The application software (being known to be malice or normal use software) of label carries out Comparative result, finely tunes entire DBN network.Using BP network training method, node value finding function choose Sigmod function.
BP network training process is as follows:
1) parameter of random initializtion BP network, reads the weight matrix W of RBM network, and training pace is initialized as N;
2) each layer of forward calculation of unit-node value, to l layers of j unit-node, nodal value isThe nodal value is all related to l-1 layers of all cell nodes, if neuron j exists Output layer (l=L) enablesError ej(n)=dj(n)-oj(n), djFor the result of label;
WijIt (n) is weight related with l j-th of unit-node of layer;
For the nodal value of l layers of j-th of unit-node, if it is output layer neuron, ojIt (n) is output layer J unit-node nodal value, djIt (n) is the correct result of output layer label, ej(n)=dj(n)-ojIt (n) is error;
3) δ is calculated, to transmitting, will successively finely tune weight downwards after δ;δ is function related with the error amount of nodal value, is used In fine tuning weight;For final output unit:
4) for hidden unit:
5) weight is finely tuned
Wherein η is learning rate, and learning rate is related with convergence rate, and learning rate is higher, is restrained faster;
If n=N, terminate;Otherwise 2) n=n+1 is gone to step.
3. sorting algorithm
According to the high-level characteristic vector of depth confidence network output, sorting algorithm classifies to Android application software, Here support vector machines (Support Vector Machine, the SVM) sorting algorithm of algorithm as model is used.SVM algorithm Including two stages: training and test.Normal sample and malice sample, SVM in the given training stage find hyperplane, this is super Plane is specified by normal line vector ω and vertical range b, which will there are two classifications of maximum back gauge γ to separate, wherein Positive is normal sample, and Negative is malice sample, as shown in Figure 4.
In test phase, test set can be divided into two classes, the decision function f such as formula (3) of Linear SVM by SVM prediction model
X indicates that sample is determined as normal sample when f (x) > 0 by the feature vector of depth confidence network output, otherwise, will Sample is judged as malice sample.
4, The effect of invention
4.1 data set
It is downloaded altogether in Google Play Store and obtains 10000 application software as normal sample collection.Malice sample set Sample number 3938, consist of two parts, a part from Genome Project (http: // Www.malgenomeproject.org/) totally 1260, a part from VirusTotal (https: // Www.virustotal.com/) totally 2678, two parts amount to 3938 malice samples.700 are randomly selected from sample set Normal sample and 700 malice samples, are then thoroughly mixed, and as one group of data, choose 5 groups of data experiments in total.From 5 2 groups are chosen respectively as training set and test set in group data.The environment of experiment is as shown in table 2.
2 experimental situation of table
4.2 are compared with other conventional machines learning algorithms
By the testing result obtained the present invention is based on the deep learning model of DBN+SVM and traditional machine learning model into Row compares, and experimental result is as shown in table 3.In an experiment, using accuracy rate (Precision), recall rate (Recall) and correct Three indexs of rate (Accuracy) come evaluate to Android malware detection result.
3 different machines learning algorithm testing result of table
For most of traditional machine learning algorithms (Bayes、Logistic Regression、KNN、 SVM), a variety of common kernel functions such as sigmoid kernel, linear kernel are tested, and it is best to choose testing result Experimental result of the data as conventional machines learning algorithm.From table 3 it can be seen that under same test collection, DBN of the invention The accuracy ratio SVM of+SVM algorithm is higher by 3.35%, thanBayes is higher by 11.83%, is higher by 12.26% than KNN, than Logistic Regression is higher by 14.38%, it can be seen that, the present invention is based on the deep learning models of DBN to be substantially better than biography The neural network model and machine learning model of system.

Claims (5)

1. a kind of method of the Android malware detection based on depth confidence network, it is characterised in that the method includes Following steps:
Extract the permission of Android application software and the feature of sensitive API;
Deep learning model is constructed using depth confidence network DBN, the feature extracted is used into the deep learning Model is handled, and the sample with higher level of abstraction feature is obtained;
Using sorting algorithm, classifies to the sample characterized by the deep learning model, distinguish Malware And normal software.
2. according to the method described in claim 1, it is characterized in that the permission and sensitivity for extracting Android application software The feature of API specifically includes the following steps:
The installation file of the application software is decompressed, AndroidManifest.xml and classes.dex text is obtained Part, by obtaining the permission of Android application software and the feature of sensitive API to the document analysis.
3. according to the method described in claim 1, wherein constructed using depth confidence network DBN deep learning model include by It unsupervised pre-training stage and is made of two stages of back-propagating stage of supervision.
4. according to the method described in claim 3, it is characterized in that the pre-training phase process is as follows:
The feature vector x in training set for being N for sample sizen(0≤n < N)
1) n=0
2) by xnIt is transmitted to visual layers V0, hidden layer H is calculated according to formula (1)0:
P(h0j=1 | V0)=σ (WjV0) --- formula (1)
In above-mentioned formula, P is probability-distribution function, is the core of trained weight in CD algorithm;
hijIndicate the value of j-th of hidden unit in i-th layer of hidden layer;
ViIndicate i-th layer of RBM visual layers vector, HiIndicate i-th layer of RBM hidden layer vector;
WiIndicate the visual layers of i-th layer of RBM and the weight vector of hidden layer mapping relations;
σ calculation formula is as follows:
σ (x)=1/ (1+exp (- x))
3) visual layers are calculated according to formula (2) and obtains V1:
In above formula, vijIndicate the value of j-th of visual element in i-th layer of visual layers, the transposition of superscript T representing matrix;
4) hidden layer H is calculated further according to formula (1)1:
P(h1j=1 | V1)=σ (WjV1)
5) for all node j, weight is updated:
λ is related with convergence rate, is a constant, λ is bigger, and convergence rate is faster
If n=N-1, terminate, otherwise 2) n=n+1, goes to step.
5. according to the method described in claim 3, it is characterized in that the back-propagating BP is comprised the following processes:
1) parameter of random initializtion BP network, reads the weight matrix W of RBM network, and training pace is initialized as N;
2) each layer of forward calculation of unit-node value, to l layers of j unit-node, nodal value isThe nodal value is all related to l-1 layers of all cell nodes, if neuron j exists Output layer (l=L) enablesError ej(n)=dj(n)-oj(n), djFor the result of label;
WijIt (n) is weight related with l j-th of unit-node of layer;
For the nodal value of l layers of j-th of unit-node, if it is output layer neuron, ojIt (n) is the j of output layer The nodal value of unit-node, djIt (n) is the correct result of output layer label, ej(n)=dj(n)-ojIt (n) is error;
3) δ is calculated, to transmitting, will successively finely tune weight downwards after δ;δ is function related with the error amount of nodal value, for micro- Adjust weight;
For output unit:
4) for hidden unit:
5) weight is finely tuned
Wherein η is learning rate, and learning rate is related with convergence rate, and learning rate is higher, is restrained faster;If n=N, knot Beam;Otherwise 2) n=n+1 is gone to step.
CN201910431019.4A 2019-05-22 2019-05-22 A method of the Android malware detection based on depth confidence network Withdrawn CN110245493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910431019.4A CN110245493A (en) 2019-05-22 2019-05-22 A method of the Android malware detection based on depth confidence network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910431019.4A CN110245493A (en) 2019-05-22 2019-05-22 A method of the Android malware detection based on depth confidence network

Publications (1)

Publication Number Publication Date
CN110245493A true CN110245493A (en) 2019-09-17

Family

ID=67884811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910431019.4A Withdrawn CN110245493A (en) 2019-05-22 2019-05-22 A method of the Android malware detection based on depth confidence network

Country Status (1)

Country Link
CN (1) CN110245493A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125699A (en) * 2019-12-04 2020-05-08 中南大学 Malicious program visual detection method based on deep learning
CN112698841A (en) * 2021-01-14 2021-04-23 北京大学(天津滨海)新一代信息技术研究院 Android-oriented deep learning model unified deployment system, method, equipment and medium
CN112989342A (en) * 2021-03-04 2021-06-18 北京邮电大学 Malicious software detection network optimization method and device, electronic equipment and storage medium
CN113378171A (en) * 2021-07-12 2021-09-10 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271788A (en) * 2018-08-23 2019-01-25 北京理工大学 A kind of Android malware detection method based on deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271788A (en) * 2018-08-23 2019-01-25 北京理工大学 A kind of Android malware detection method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
欧阳立等: "基于深度置信网络的Android恶意软件检测", 《网络与安全》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125699A (en) * 2019-12-04 2020-05-08 中南大学 Malicious program visual detection method based on deep learning
CN111125699B (en) * 2019-12-04 2023-04-18 中南大学 Malicious program visual detection method based on deep learning
CN112698841A (en) * 2021-01-14 2021-04-23 北京大学(天津滨海)新一代信息技术研究院 Android-oriented deep learning model unified deployment system, method, equipment and medium
CN112989342A (en) * 2021-03-04 2021-06-18 北京邮电大学 Malicious software detection network optimization method and device, electronic equipment and storage medium
CN113378171A (en) * 2021-07-12 2021-09-10 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network
CN113378171B (en) * 2021-07-12 2022-06-21 东北大学秦皇岛分校 Android lasso software detection method based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN110245493A (en) A method of the Android malware detection based on depth confidence network
Sinha et al. Certifying some distributional robustness with principled adversarial training
EP3754549B1 (en) A computer vision method for recognizing an object category in a digital image
CN109299741B (en) Network attack type identification method based on multi-layer detection
CN109194612B (en) Network attack detection method based on deep belief network and SVM
CN110135167B (en) Edge computing terminal security level evaluation method for random forest
JP2015095212A (en) Identifier, identification program, and identification method
CN108710893B (en) Digital image camera source model classification method based on feature fusion
CN103927550B (en) A kind of Handwritten Numeral Recognition Method and system
CN113139651A (en) Training method and device of label proportion learning model based on self-supervision learning
CN113554100B (en) Web service classification method for enhancing attention network of special composition picture
CN112668698A (en) Neural network training method and system
CN115577357A (en) Android malicious software detection method based on stacking integration technology
CN110119355A (en) A kind of knowledge based map vectorization reasoning common software defect modeling method
CN112966754A (en) Sample screening method, sample screening device and terminal equipment
Xie et al. Andro_MD: android malware detection based on convolutional neural networks
CN113222053B (en) Malicious software family classification method, system and medium based on RGB image and Stacking multi-model fusion
KR102085415B1 (en) Method and Apparatus of Intrusion Detection for Wi-Fi Network Based on Weight-Selected Neural Networks
Klein et al. Jasmine: A new Active Learning approach to combat cybercrime
Jere et al. Principal component properties of adversarial samples
Abady et al. A siamese-based verification system for open-set architecture attribution of synthetic images
CN114998330B (en) Unsupervised wafer defect detection method, unsupervised wafer defect detection device, unsupervised wafer defect detection equipment and storage medium
WO2022162839A1 (en) Learning device, learning method, and recording medium
CN115713669A (en) Image classification method and device based on inter-class relation, storage medium and terminal
JP2023154373A (en) Information processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190917