CN108280350B - Android-oriented mobile network terminal malicious software multi-feature detection method - Google Patents

Android-oriented mobile network terminal malicious software multi-feature detection method Download PDF

Info

Publication number
CN108280350B
CN108280350B CN201810109044.6A CN201810109044A CN108280350B CN 108280350 B CN108280350 B CN 108280350B CN 201810109044 A CN201810109044 A CN 201810109044A CN 108280350 B CN108280350 B CN 108280350B
Authority
CN
China
Prior art keywords
software
malicious
sensitive
malware
family
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810109044.6A
Other languages
Chinese (zh)
Other versions
CN108280350A (en
Inventor
庄毅
王军
顾晶晶
蒋理
杨帆
孙炳林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201810109044.6A priority Critical patent/CN108280350B/en
Publication of CN108280350A publication Critical patent/CN108280350A/en
Application granted granted Critical
Publication of CN108280350B publication Critical patent/CN108280350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a mobile network terminal malicious software multi-feature detection method for Android. The method comprises the following steps: step 1, obtaining an Android software dataset which comprises a malicious sample and a non-malicious sample; step 2, analyzing the installation package of the malicious software, extracting the installation package characteristics of the software, and constructing an installation package characteristic vector; step 3, acquiring the authority of the software application and constructing an authority list; step 4, decompiling the installation package of the malicious software, constructing a sensitive behavior diagram of the software, and extracting a sensitive behavior set of the software; step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to construct a malware family feature library; and 6, extracting software features, and performing malice judgment and malicious family classification. According to the invention, the software package characteristics, the authority characteristics and the software sensitive behavior calling characteristics are selected as the basis for judging the malicious software, so that the accuracy of detecting the malicious behaviors of the software can be improved, and the capability of classifying malicious software families is realized.

Description

Android-oriented mobile network terminal malicious software multi-feature detection method
Technical Field
The invention belongs to the field of mobile software analysis and information security, and particularly relates to an Android-oriented mobile network terminal malicious software multi-feature detection method.
Background
The Android malicious code multi-label detection problem is a challenging problem in academia and industry. The malicious nature of the software is judged and at the same time the family to which the software belongs is also given. The application of the current smart phone relates to various aspects of life of people, and the Android system occupies a large share of the smart phone, so that the Android malicious code is accurately detected, and the method has important significance and application value for protecting the privacy and property safety of Android users.
The existing Android malicious software detection technology is mainly divided into 2 types: static analysis-based and dynamic analysis-based detection techniques, respectively. The dynamic analysis method simulates the execution of software, and can bypass the problems of code confusion, encryption and the like encountered by a static method; but dynamic test code coverage is low and some malicious programs may prevent themselves from running under the simulator. The static analysis method mainly researches and uses a decompilation technology or a control flow and data flow analysis technology on a smali intermediate code, can automatically analyze software, has higher detection efficiency and high code coverage rate, and is suitable for analyzing a large number of software samples; the disadvantage is that it is necessary to solve the problem of code obfuscation, encryption, and decoding malicious code in dynamic execution, which is difficult to detect with static methods. In order to deal with the problem, researchers consider technologies such as encryption, code dynamic loading, Native code dynamic loading and the like, such as Riskranker and droid range, in malware detection.
At present, many scholars perform related research on a multi-label detection method of Android malicious software. For example, Daniel Arp et al proposes a static analysis method-based Android malicious code multi-label detection method, extracts a large number of static features from a software installation package, and classifies the features by using a support vector machine, thereby realizing efficient detection; yu Feng et al provides a feature description language for describing Android malicious families, and classifies software to be detected by using a feature matching algorithm, so that semantic-based Android malicious software detection is realized; chao Yang et al describe the logical behavior of the software by using a two-stage behavior diagram representation method, judge the maliciousness of the software by combining the static taint analysis and the behavior diagram among the components through the analysis of the malicious behavior patterns, and realize the classification of malicious families.
However, in the research of the conventional Android malware multi-tag detection technology, all samples of malware are selected for analysis, and characteristics of the malware are extracted and used as a basis for judging the maliciousness of the software to be detected. Malicious software belonging to different families has different malicious behaviors, and the characteristics expressed by the malicious behaviors are also greatly different. Malware of the same malware family have similar malicious behavior. However, existing malware detection tools are less capable of multi-tag detection of malware, such as McAfee, which detects malicious samples in Genome data sets, wherein more than 90% of the samples are detected as Trojan or Downloader, and actually belong to a plurality of different malware families (e.g., DroidDream). Therefore, the speed and the accuracy are both to be further improved, and an efficient malware multi-tag detection method needs to be researched.
Disclosure of Invention
The invention aims to provide an Android-oriented mobile network terminal malicious software multi-feature detection method, so that the features of Android malicious software are effectively extracted, the Android malicious software detection precision is improved, and the Android-oriented mobile network terminal malicious software multi-feature detection method has the Android malicious family classification capability.
The technical solution for realizing the invention is as follows: a mobile network terminal malicious software multi-feature detection method for Android specifically comprises the following steps:
step 1, obtaining Android malware samples, marking Android malware families to which the samples belong, and then obtaining non-malware samples so as to construct malicious and non-malware sample datasets;
step 2, extracting installation package characteristics of the software, comprising the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F;
step 3, processing an Android software sample by using a decompiling tool, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
step 4, decompiling the installation package, constructing a software function call graph, positioning a security sensitive method therein, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method;
step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
and 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and the feature library of the malicious software family to obtain the name of the malicious software family with the highest similarity, outputting the software as the malicious software if the similarity exceeds a threshold value, and outputting the malicious software family to which the software belongs, otherwise, outputting the software as benign software.
Compared with the prior art, the invention has the following remarkable advantages: 1) the invention provides an Android-oriented mobile network terminal malicious software multi-feature detection method, aiming at different malicious software families, analyzing software from three aspects of software package features, application authority features and software behavior calling features on the basis of a static analysis method; 2) according to the invention, a statistical analysis method is adopted to extract features of a malicious software family, a malicious software family feature library is constructed, and a malicious software multi-label detection method is provided based on the feature library, so that better malicious judgment precision and malicious family classification precision can be achieved.
The invention is explained in further detail below with reference to the drawings.
Drawings
Fig. 1 is a flowchart of a malicious software multi-feature detection method for an Android-oriented mobile network terminal according to the present invention.
FIG. 2 is a comparison of malware detection accuracy and malicious family classification accuracy with the partial engines in VirusTotal, using the present invention.
Detailed Description
With reference to the attached drawings, the Android-oriented mobile network terminal malicious software multi-feature detection method comprises the following steps:
step 1, obtaining Android malware samples, marking Android malware families to which the samples belong, and then obtaining non-malware samples so as to construct malicious and non-malware sample datasets;
step 2, extracting installation package characteristics of the software, comprising the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F;
the abnormal file refers to a file of which the suffix does not match with the type specified by the file content; judging whether the file exists or not, and judging whether the library file is a root extension file or not according to the MD5 value; and judging whether subprograms exist in the jar file, the dex file and the apk file.
Step 3, processing an Android software sample by using a decompiling tool, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
step 4, decompiling the installation package, constructing a software function call graph, positioning a security sensitive method therein, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method;
the security-sensitive method comprises: a method of authority protection, a Source/Sink method of information flow and other suspicious methods; the authority protection method refers to an API which can be used only when the authority needs to be applied in an Android system, the information flow Source/Sink method refers to a method which can possibly generate or send sensitive information, and other suspicious methods comprise a dynamic loading function, a reflection function, an encryption and decryption function, a Native code execution function and a calling function.
The constructed software function call graph is the following four-tuple:
SBG=(VD,VN,E,μ)
wherein, VDCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereind∈VDIs one of the security sensitive methods; vNCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereinn∈VNThe method is a non-security sensitive method, but directly or indirectly calls a security sensitive method; e is as large as VN×VDCalling a set of graph edges for software sensitive behaviors to indicate that methods have a calling relationship therebetween, wherein any one edge e ═ vn,vd) E represents a non-security-sensitive method v in softwaren∈VNDirectly or indirectly calling security sensitive method vd∈VDOr component CsMethod v of (1)nTriggering component C directly or indirectly through ICCtMethod v of (1)d(ii) a Marking function μ Vd→<ID, EntryType, Para > is used for marking the content contained in the node in the graph, namely the context information of the method, including the method ID, the entry point type EntryType and the parameter Para;
the set of sensitive behaviors is the set shown below:
SBS={S1,…,Si,…,Sm}
wherein S isi={v|(vi,v)∈E∧vi∈VN∧v∈VDThe method is a security sensitive method set, and a diagram SBG (SBG-V) for representing sensitive behaviors is calledD,VNIn E, mu), the set of all security sensitive methods directly or indirectly called by the ith non-sensitive security method of the VN set; m ═ VNAnd | is the length of the set SBS.
Step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
the constructed Android malicious software family multi-feature model is the following six-tuple:
M=(SBSc,α,Fc,β,Pc,γ)
wherein the content of the first and second substances,
Figure BDA0001568599290000041
the method comprises the steps that a sensitive behavior set which is common to a malware family is obtained by statistically analyzing a sensitive behavior set SBS of a sample of the same malware family; marking function
Figure BDA0001568599290000042
For marking SBScProbability of occurrence of the mesosensitive set of methods in the malware family sample; fcCounting the common software installation package characteristics of the obtained malicious software family samples by analyzing the installation package characteristic vector F of the same malicious software family sample; the marking function beta F belongs to Fc→[0,1]For marking FcThe probability of occurrence of various features in the malware family sample; pcThe method comprises the steps of counting an authority list frequently applied by a malicious software family sample by analyzing an authority list P of the same malicious software family sample; the marking function gamma is P belongs to Pc→[0,1]For marking PcThe probability of each privilege appearing in the malware family sample.
And 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and the feature library of the malicious software family to obtain the name of the malicious software family with the highest similarity, outputting the software as the malicious software if the similarity exceeds a threshold value, and outputting the malicious software family to which the software belongs, otherwise, outputting the software as benign software.
The similarity between the software to be tested and the malware family is represented as:
Figure BDA0001568599290000043
wherein SfSimilarity as software feature vectors,SpFor similarity of authority lists, SsbsSimilarity of sensitive behavior sets, μiThe weight value of each similarity in calculation;
software feature vector similarity SfThe calculation method comprises the following steps: giving a feature vector F ═ F of the software to be tested1,f2,f3,...,fmAnd f, feature vectors in the multi-feature model of the malware family to be matched
Figure BDA0001568599290000044
And the corresponding labeling function β, then:
Figure BDA0001568599290000051
calculating similarity according to the probability of each feature, wherein if the values in the feature vectors in the multi-feature model of the malicious family are all 0, the similarity is 0; wherein the correction factor omegafThe calculation method comprises the following steps: all of the variables F in the vector Fifi cNumber of features divided by vector FcA median number of 1 features;
permission list similarity S of softwarepThe calculation method comprises the following steps: giving an authority list P of the software to be tested and the authority list P in the multi-feature model of the malicious software family to be matchedc={p1 c,p2 c,...,pn cAnd the corresponding marking function γ, then:
Figure BDA0001568599290000052
Figure BDA0001568599290000053
wherein the correction factor omegapThe calculation method comprises the following steps: belonging to P in permission set PcIs divided by the set PcLength of (d); when authority list PcElement (1) of
Figure BDA0001568599290000054
When included in the permission list P of the software under test,
Figure BDA0001568599290000055
the value is 1, otherwise 0;
sensitive behavior set similarity SsbsThe calculation method comprises the following steps: given the SBS, which is a sensitive behavior set for software, the set of sensitive behaviors in the multiconfeatures of the malware family to be matched
Figure BDA0001568599290000056
And the corresponding marking function α, then:
Figure BDA0001568599290000057
Figure BDA0001568599290000058
in the formula, ωsbsThe calculation method for the correction factor comprises the following steps: all in SBS
Figure BDA0001568599290000059
Set S ofi cIs divided by the amount of SBS in the setcLength of (d); wherein the function
Figure BDA00015685992900000510
Represents: there is a certain set S in SBS, and set
Figure BDA00015685992900000511
The proportion of similar elements in (1) to all elements in the two sets is larger than theta (0)<θ≤1)。
Therefore, the characteristics of the malicious software family are extracted by adopting a statistical analysis method, the malicious software family characteristic library is constructed, the malicious software multi-label detection method is provided based on the characteristic library, and the high malicious judgment precision and the malicious family classification precision can be achieved.
In order to make those skilled in the art better understand the technical problems, technical solutions and technical effects of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
Examples
A multi-feature detection method for Android-oriented mobile network terminal malicious software uses a Drebin data set and non-malicious software samples obtained from Google Play to form a data set, and the malicious code detection and family classification specifically comprise the following steps:
step 1: dividing samples in Drebin according to a malicious family to which the samples belong, acquiring non-malicious software on Google Play by using a web crawler method, and verifying by using VirusTotal on-line detection service, thereby constructing a sample data set comprising 4486 malicious software samples of 24 malicious software families and 2140 benign software samples;
step 2: decompressing the software installation package to be analyzed by using a Zip decompression tool, and extracting the installation package characteristics of the software, wherein the method comprises the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F; when judging whether a file for a root system exists, comparing the MD5 value of the existing root extension library file with the file in the software installation package; judging whether an abnormal file exists, analyzing the file content through an Apache Tika tool to obtain the file type, and comparing the file type with a file suffix; judging whether a subprogram exists or not, and checking whether a jar file, a dex file and an apk file exist in the subprogram or not;
and step 3: processing an Android software sample by using an APKParser, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
and 4, step 4: the method comprises the steps of using a Soot tool to decompile an installation package, constructing a software function call graph, positioning a security sensitive method in the software function call graph, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method;
security sensitive methods of interest include: a method of authority protection, a Source/Sink method of information flow and other suspicious methods; the authority protection method refers to an API which can be used only when the authority needs to be applied in an Android system, the information flow Source/Sink method refers to a method which can possibly generate or send sensitive information, and other suspicious methods comprise a dynamic loading function, a reflection function, an encryption and decryption function, a Native code execution function and a calling function.
The constructed sensitive behavior call graph is the following four-tuple:
SBG=(VD,VN,E,μ)
wherein, VDCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereind∈VDIs one of the security sensitive methods; vNCalling a subset of the set of points in the graph for the software function, any node v thereinn∈VNThe method is a non-security sensitive method, but directly or indirectly calls a security sensitive method; e is as large as VN×VDAnd calling the collection of graph edges for sensitive behaviors to indicate that the methods have calling relations. Wherein any one side e ═ v (v)n,vd) E represents a non-security-sensitive method v in softwaren∈VNDirectly or indirectly calling security sensitive method vd∈VDOr component CsMethod v of (1)nTriggering component C directly or indirectly through ICCtMethod v of (1)d(ii) a Marking function μ Vd→<ID,EntryType,Para>The method is used for marking the content contained in the vertex in the graph and comprises a method ID, an entry point type EntryType and a parameter Para.
The set of sensitive behaviors is the set shown below:
SBS={S1,S2,…,Sm}
wherein S isi={v|(vi,v)∈E∧vi∈VN∧v∈VDThe method is a security sensitive method set, and a diagram SBG (SBG-V) for representing sensitive behaviors is calledD,VNIn E, μ) ofnThe set is formed by all security sensitive methods directly or indirectly called by the ith non-sensitive security method of the set; m ═ VNL is the length of the set SBS;
and 5, selecting 75 percent (3341 samples) of the 24 malware family samples as the samples for feature extraction, and constructing a malware family feature library. Performing statistical analysis on software features belonging to the same malware family in a malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
the constructed Android malware family multi-feature model is the following six-tuple:
M=(SBSc,α,Fc,β,Pc,γ)
wherein the content of the first and second substances,
Figure BDA0001568599290000071
the method comprises the steps that a sensitive behavior set which is common to a malware family is obtained by statistically analyzing a sensitive behavior set SBS of a sample of the same malware family; marking function
Figure BDA0001568599290000072
For marking SBScProbability of occurrence of the mesosensitive set of methods in the malware family sample; fcThe method comprises the steps that the common software installation package characteristics of the malicious software family samples are obtained through statistics by analyzing the installation package characteristics F of the same malicious software family sample; the marking function beta F belongs to Fc→[0,1]For marking FcThe probability of each feature in the malware family sample occurring; pcThe method comprises the steps of counting an authority list frequently applied by a malicious software family sample by analyzing an authority list P of the same malicious software family sample; the marking function gamma is P belongs to Pc→[0,1]For marking PcThe probability of each privilege appearing in a malware family sample;
step 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and a malicious software family feature library to obtain a malicious software family name with the highest similarity, outputting the software as malicious software if the similarity exceeds 0.7, and outputting the malicious software family to which the software belongs, otherwise, outputting the software as benign software;
the similarity between the software to be tested and the malware family is expressed as:
Figure BDA0001568599290000073
wherein SfIs the similarity of the feature vectors, SpFor similarity of authority lists, SsbsSimilarity of sensitive behavior sets, μiIn the experiment, three weight values are taken as the weight values of each similarity in calculation
Figure BDA0001568599290000074
The similarity calculation method of the software feature vector comprises the steps of giving the feature vector F of the software to be tested to be F ═ F1,f2,f3,...,fmAnd F, feature vector in the multi-feature model of the malware family to be matchedc={f1 c,f2 c,f3 c,...,fm cAnd the similarity of the corresponding labeling function β is calculated as follows:
Figure BDA0001568599290000081
and calculating the similarity according to the probability of the occurrence of each feature, wherein if the values of the feature vectors in the multi-feature model of the malicious family are all 0, the similarity is 0. Wherein the correction factor omegafThe calculation method comprises the following steps: all of the variables F in the vector Fifi cNumber of features divided by vector FcNumber of features with a median of 1.
The method for calculating the similarity of the software permission list comprises the steps of giving the permission list P of the software to be tested and giving the permission list P in the multi-feature model of the malicious software family to be matchedc={p1 c,p2 c,...,pn cAnd the similarity of the corresponding labeling function γ is calculated as follows:
Figure BDA0001568599290000082
Figure BDA0001568599290000083
wherein the correction factor omegapThe calculation method comprises the following steps: belonging to P in permission set PcIs divided by the set PcLength of (d).
The method for calculating the similarity of the sensitive behavior sets comprises the steps of giving the SBS of the sensitive behavior sets of the software and obtaining the sensitive behavior sets in the multi-features of the malicious software families to be matched
Figure BDA0001568599290000084
And the corresponding marking function alpha, and the calculation method of the similarity is shown as the following formula:
Figure BDA0001568599290000085
Figure BDA0001568599290000086
in order to prevent the more featured malware family from covering the less featured family, a correction factor omega is introducedsbsThe calculation method comprises the following steps: all in SBS
Figure BDA0001568599290000087
Set of (2)
Figure BDA0001568599290000088
Is divided by the amount of SBS in the setcLength of (d). Wherein the function
Figure BDA0001568599290000089
Represents: there is a certain set S in SBS, and set
Figure BDA00015685992900000810
The proportion of similar elements in (a) to all elements in both sets is greater than 80%.
The remaining 25% (1145) malware samples and 2140 benign software samples were tested using the above method, and the results of the software maliciousness determination and malicious family classification are compared with the results of the 8 antivirus engines commonly found in VirusTotal as shown in fig. 2.
Therefore, the method selects the software package characteristics, the authority characteristics and the software sensitive behavior calling characteristics as the basis for judging the malicious software, can improve the accuracy of detecting the malicious behaviors of the software, and has the capability of classifying malicious software families.

Claims (5)

1. A mobile network terminal malicious software multi-feature detection method for Android is characterized by comprising the following steps:
step 1, obtaining Android malware samples, marking Android malware families to which the samples belong, and then obtaining non-malware samples so as to construct malicious and non-malware sample datasets;
step 2, extracting installation package characteristics of the software, comprising the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F;
step 3, processing an Android software sample by using a decompiling tool, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
step 4, decompiling the installation package, constructing a software function call graph, positioning a security sensitive method therein, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method; the constructed software function call graph is the following four-tuple:
SBG=(VD,VN,E,μ)
wherein, VDCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereind∈VDIs one of the security sensitive methods; vNCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereinn∈VNThe method is a non-security sensitive method, but directly or indirectly calls a security sensitive method; e is as large as VN×VDCalling a set of graph edges for software sensitive behaviors to indicate that methods have a calling relationship therebetween, wherein any one edge e ═ vn,vd) E represents a non-security-sensitive method v in softwaren∈VNDirectly or indirectly calling security sensitive method vd∈VDOr component CsMethod v of (1)nTriggering component C directly or indirectly through ICCtMethod v of (1)d(ii) a Marking function μ Vd→<ID,EntryType,Para>For marking the contents of a dot inclusion in a graph, i.e. VDAnd VNThe context information of the method (1), including a method ID, an entry point type EntryType, and a parameter Para;
the set of sensitive behaviors is the set shown below:
SBS={S1,…,Si,…,Sm}
wherein S isi={v|(vi,v)∈E∧vi∈VN∧v∈VDThe method is a security sensitive method set, and a diagram SBG (SBG-V) for representing sensitive behaviors is calledD,VNIn E, μ) ofNThe set is formed by all security sensitive methods directly or indirectly called by the ith non-sensitive security method of the set; m ═ VNL is the length of the set SBS;
step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
and 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and the malware family feature library to obtain a malware family name with the highest similarity, outputting the software to be tested as malware if the similarity exceeds a threshold value, and outputting the malware family to which the software to be tested belongs, otherwise, outputting the software to be tested as benign software.
2. The Android-oriented mobile network terminal malware multi-feature detection method as claimed in claim 1, wherein the abnormal file in step 2 refers to a file whose suffix does not match with a type specified by the file content itself; the method comprises the steps of judging whether a file exists or not, judging whether a library file is a rootextension file or not through an MD5 value; and judging whether subprograms exist in the jar file, the dex file and the apk file.
3. The Android-oriented mobile network terminal malware multi-feature detection method of claim 1, wherein the security-sensitive method in step 4 comprises: a method of authority protection, a Source/Sink method of information flow and other suspicious methods; the authority protection method refers to an API which can be used only when the authority needs to be applied in an Android system, the information flow Source/Sink method refers to a method which can possibly generate or send sensitive information, and other suspicious methods comprise a dynamic loading function, a reflection function, an encryption and decryption function, a Native code execution function and a calling function.
4. The Android-oriented mobile network terminal malware multi-feature detection method of claim 1, wherein the Android malware family multi-feature model constructed in step 5 is the following six-tuple:
M=(SBSc,α,Fc,β,Pc,γ)
wherein the content of the first and second substances,
Figure FDA0003152045520000021
the method comprises the steps that a sensitive behavior set which is common to a malware family is obtained by statistically analyzing a sensitive behavior set SBS of a sample of the same malware family;marking function
Figure FDA0003152045520000022
For marking SBScProbability of occurrence of the mesosensitive set of methods in the malware family sample; fcCounting the common software installation package characteristics of the obtained malicious software family samples by analyzing the installation package characteristic vector F of the same malicious software family sample; the marking function beta F belongs to Fc→[0,1]For marking FcThe probability of occurrence of various features in the malware family sample; pcThe method comprises the steps of counting an authority list frequently applied by a malicious software family sample by analyzing an authority list P of the same malicious software family sample; the marking function gamma is P belongs to Pc→[0,1]For marking PcThe probability of each privilege appearing in the malware family sample.
5. The Android-oriented mobile network terminal malware multi-feature detection method as claimed in claim 1, wherein the similarity between the software to be detected and a malware family in step 6 is represented as:
Figure FDA0003152045520000031
wherein SfIs the similarity of the software feature vectors, SpFor similarity of authority lists, SsbsSimilarity of sensitive behavior sets, μiThe weight value of each similarity in calculation;
software feature vector similarity SfThe calculation method comprises the following steps: giving a feature vector F ═ F of the software to be tested1,f2,f3,...,fmAnd F, feature vector in the multi-feature model of the malware family to be matchedc={f1 c,f2 c,f3 c,...,fm cAnd the corresponding labeling function β, then:
Figure FDA0003152045520000032
calculating similarity according to the probability of each feature, wherein if the values in the feature vectors in the multi-feature model of the malicious family are all 0, the similarity is 0; wherein the correction factor omegafThe calculation method comprises the following steps: all of the variables F in the vector Fifi cNumber of features divided by vector FcA median number of 1 features;
permission list similarity S of softwarepThe calculation method comprises the following steps: giving an authority list P of the software to be tested and the authority list P in the multi-feature model of the malicious software family to be matchedc={p1 c,p2 c,...,pn cAnd the corresponding marking function γ, then:
Figure FDA0003152045520000033
Figure FDA0003152045520000034
wherein the correction factor omegapThe calculation method comprises the following steps: belonging to P in permission set PcIs divided by the set PcLength of (d); when authority list PcElement (1) of
Figure FDA0003152045520000035
When included in the permission list P of the software under test,
Figure FDA0003152045520000036
the value is 1, otherwise 0;
sensitive behavior set similarity SsbsThe calculation method comprises the following steps: given the SBS, which is a sensitive behavior set for software, the set of sensitive behaviors in the multiconfeatures of the malware family to be matched
Figure FDA0003152045520000037
And the corresponding marking function α, then:
Figure FDA0003152045520000038
Figure FDA0003152045520000039
in the formula, ωsbsThe calculation method for the correction factor comprises the following steps: all in SBS
Figure FDA0003152045520000041
Set of (2)
Figure FDA0003152045520000042
Is divided by the amount of SBS in the setcLength of (d); wherein the function
Figure FDA0003152045520000043
Represents: there is a certain set S in SBS, and set
Figure FDA0003152045520000044
The proportion of the similar elements in the two sets to all elements is more than theta (theta is more than 0 and less than or equal to 1).
CN201810109044.6A 2018-02-05 2018-02-05 Android-oriented mobile network terminal malicious software multi-feature detection method Active CN108280350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810109044.6A CN108280350B (en) 2018-02-05 2018-02-05 Android-oriented mobile network terminal malicious software multi-feature detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810109044.6A CN108280350B (en) 2018-02-05 2018-02-05 Android-oriented mobile network terminal malicious software multi-feature detection method

Publications (2)

Publication Number Publication Date
CN108280350A CN108280350A (en) 2018-07-13
CN108280350B true CN108280350B (en) 2021-09-28

Family

ID=62807459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810109044.6A Active CN108280350B (en) 2018-02-05 2018-02-05 Android-oriented mobile network terminal malicious software multi-feature detection method

Country Status (1)

Country Link
CN (1) CN108280350B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109167753A (en) * 2018-07-23 2019-01-08 中国科学院计算机网络信息中心 A kind of detection method and device of network intrusions flow
CN110414234A (en) * 2019-06-28 2019-11-05 奇安信科技集团股份有限公司 The recognition methods of malicious code family and device
CN110457009B (en) * 2019-07-06 2023-04-14 天津大学 Method for realizing software security requirement recommendation model based on data analysis
CN110392056A (en) * 2019-07-24 2019-10-29 成都积微物联集团股份有限公司 A kind of the Internet of Things malware detection system and method for lightweight
CN110516446A (en) * 2019-08-26 2019-11-29 南京信息职业技术学院 A kind of Malware family ownership determination method, system and storage medium
CN110795732A (en) * 2019-10-10 2020-02-14 南京航空航天大学 SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal
WO2021106173A1 (en) * 2019-11-28 2021-06-03 日本電信電話株式会社 Labeling device and labeling program
CN111368297B (en) * 2020-02-02 2023-02-28 西安电子科技大学 Privacy protection mobile malicious software detection method, system, storage medium and application
CN111460448B (en) * 2020-03-09 2022-12-02 北京邮电大学 Malicious software family detection method and device
CN113378163A (en) * 2020-03-10 2021-09-10 四川大学 Android malicious software family classification method based on DEX file partition characteristics
CN113591079B (en) * 2020-04-30 2023-08-15 中移互联网有限公司 Method and device for acquiring abnormal application installation package and electronic equipment
CN112287345B (en) * 2020-10-29 2024-04-16 中南大学 Trusted edge computing system based on intelligent risk detection
CN112632539B (en) * 2020-12-28 2024-04-09 西北工业大学 Dynamic and static hybrid feature extraction method in Android system malicious software detection
KR102491451B1 (en) * 2020-12-31 2023-01-27 주식회사 이스트시큐리티 Apparatus for generating signature that reflects the similarity of the malware detection classification system based on deep neural networks, method therefor, and computer recordable medium storing program to perform the method
CN112887328A (en) * 2021-02-24 2021-06-01 深信服科技股份有限公司 Sample detection method, device, equipment and computer readable storage medium
CN113468532B (en) * 2021-07-20 2022-09-23 国网湖南省电力有限公司 Malicious software family inference method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440459A (en) * 2013-09-25 2013-12-11 西安交通大学 Function-call-based Android malicious code detection method
CN104794051A (en) * 2014-01-21 2015-07-22 中国科学院声学研究所 Automatic Android platform malicious software detecting method
CN105447388A (en) * 2015-12-17 2016-03-30 福建六壬网安股份有限公司 Android malicious code detection system and method based on weight
CN107169351A (en) * 2017-05-11 2017-09-15 北京理工大学 With reference to the Android unknown malware detection methods of dynamic behaviour feature
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN107392021A (en) * 2017-07-20 2017-11-24 中南大学 A kind of Android malicious application detection methods based on multiclass feature

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI461952B (en) * 2012-12-26 2014-11-21 Univ Nat Taiwan Science Tech Method and system for detecting malware applications

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440459A (en) * 2013-09-25 2013-12-11 西安交通大学 Function-call-based Android malicious code detection method
CN104794051A (en) * 2014-01-21 2015-07-22 中国科学院声学研究所 Automatic Android platform malicious software detecting method
CN105447388A (en) * 2015-12-17 2016-03-30 福建六壬网安股份有限公司 Android malicious code detection system and method based on weight
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN107169351A (en) * 2017-05-11 2017-09-15 北京理工大学 With reference to the Android unknown malware detection methods of dynamic behaviour feature
CN107392021A (en) * 2017-07-20 2017-11-24 中南大学 A kind of Android malicious application detection methods based on multiclass feature

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context",;"AppContext: Differentiating Malicious and Benign Mobile App Beh;《2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence》;20150531;第303-313页 *
"FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps";Steven Arzt等;《ACM SIGPLAN Notices》;20140630;第259-269页 *
"一种Android恶意软件多标签检测方法";王军 等;《小型微型计算机系统》;20171031;第38卷(第10期);第2307-2311页,正文第1-6节、图1 *
"基于敏感路径识别的安卓应用安全性分析方法";缪小川;《中国优秀硕士学位论文全文数据库 信息科技辑》;20161015(第2016-10期);第I138-10页,正文第2-3章 *

Also Published As

Publication number Publication date
CN108280350A (en) 2018-07-13

Similar Documents

Publication Publication Date Title
CN108280350B (en) Android-oriented mobile network terminal malicious software multi-feature detection method
Shijo et al. Integrated static and dynamic analysis for malware detection
Park et al. Deriving common malware behavior through graph clustering
Park et al. Fast malware classification by automated behavioral graph matching
US8806641B1 (en) Systems and methods for detecting malware variants
US8627478B2 (en) Method and apparatus for inspecting non-portable executable files
Iwamoto et al. Malware classification based on extracted API sequences using static analysis
Shhadat et al. The use of machine learning techniques to advance the detection and classification of unknown malware
Zolkipli et al. An approach for malware behavior identification and classification
US8108931B1 (en) Method and apparatus for identifying invariants to detect software tampering
CN109586282B (en) Power grid unknown threat detection system and method
US20200193031A1 (en) System and Method for an Automated Analysis of Operating System Samples, Crashes and Vulnerability Reproduction
Ugarte-Pedrero et al. Countering entropy measure attacks on packed software detection
US20200012793A1 (en) System and Method for An Automated Analysis of Operating System Samples
US20160094574A1 (en) Determining malware based on signal tokens
CN109255241B (en) Android permission promotion vulnerability detection method and system based on machine learning
Zakeri et al. A static heuristic approach to detecting malware targets
Lee et al. Screening smartphone applications using malware family signatures
KR20120073018A (en) System and method for detecting malicious code
Martinelli et al. I find your behavior disturbing: Static and dynamic app behavioral analysis for detection of android malware
Pandey et al. Performance of malware detection tools: A comparison
US11068595B1 (en) Generation of file digests for cybersecurity applications
Ugarte-Pedrero et al. Semi-supervised learning for packed executable detection
US9177146B1 (en) Layout scanner for application classification
KR20110087826A (en) Method for detecting malware using vitual machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant