CN108280350B - Android-oriented mobile network terminal malicious software multi-feature detection method - Google Patents
Android-oriented mobile network terminal malicious software multi-feature detection method Download PDFInfo
- Publication number
- CN108280350B CN108280350B CN201810109044.6A CN201810109044A CN108280350B CN 108280350 B CN108280350 B CN 108280350B CN 201810109044 A CN201810109044 A CN 201810109044A CN 108280350 B CN108280350 B CN 108280350B
- Authority
- CN
- China
- Prior art keywords
- software
- malicious
- sensitive
- malware
- family
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/53—Decompilation; Disassembly
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a mobile network terminal malicious software multi-feature detection method for Android. The method comprises the following steps: step 1, obtaining an Android software dataset which comprises a malicious sample and a non-malicious sample; step 2, analyzing the installation package of the malicious software, extracting the installation package characteristics of the software, and constructing an installation package characteristic vector; step 3, acquiring the authority of the software application and constructing an authority list; step 4, decompiling the installation package of the malicious software, constructing a sensitive behavior diagram of the software, and extracting a sensitive behavior set of the software; step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to construct a malware family feature library; and 6, extracting software features, and performing malice judgment and malicious family classification. According to the invention, the software package characteristics, the authority characteristics and the software sensitive behavior calling characteristics are selected as the basis for judging the malicious software, so that the accuracy of detecting the malicious behaviors of the software can be improved, and the capability of classifying malicious software families is realized.
Description
Technical Field
The invention belongs to the field of mobile software analysis and information security, and particularly relates to an Android-oriented mobile network terminal malicious software multi-feature detection method.
Background
The Android malicious code multi-label detection problem is a challenging problem in academia and industry. The malicious nature of the software is judged and at the same time the family to which the software belongs is also given. The application of the current smart phone relates to various aspects of life of people, and the Android system occupies a large share of the smart phone, so that the Android malicious code is accurately detected, and the method has important significance and application value for protecting the privacy and property safety of Android users.
The existing Android malicious software detection technology is mainly divided into 2 types: static analysis-based and dynamic analysis-based detection techniques, respectively. The dynamic analysis method simulates the execution of software, and can bypass the problems of code confusion, encryption and the like encountered by a static method; but dynamic test code coverage is low and some malicious programs may prevent themselves from running under the simulator. The static analysis method mainly researches and uses a decompilation technology or a control flow and data flow analysis technology on a smali intermediate code, can automatically analyze software, has higher detection efficiency and high code coverage rate, and is suitable for analyzing a large number of software samples; the disadvantage is that it is necessary to solve the problem of code obfuscation, encryption, and decoding malicious code in dynamic execution, which is difficult to detect with static methods. In order to deal with the problem, researchers consider technologies such as encryption, code dynamic loading, Native code dynamic loading and the like, such as Riskranker and droid range, in malware detection.
At present, many scholars perform related research on a multi-label detection method of Android malicious software. For example, Daniel Arp et al proposes a static analysis method-based Android malicious code multi-label detection method, extracts a large number of static features from a software installation package, and classifies the features by using a support vector machine, thereby realizing efficient detection; yu Feng et al provides a feature description language for describing Android malicious families, and classifies software to be detected by using a feature matching algorithm, so that semantic-based Android malicious software detection is realized; chao Yang et al describe the logical behavior of the software by using a two-stage behavior diagram representation method, judge the maliciousness of the software by combining the static taint analysis and the behavior diagram among the components through the analysis of the malicious behavior patterns, and realize the classification of malicious families.
However, in the research of the conventional Android malware multi-tag detection technology, all samples of malware are selected for analysis, and characteristics of the malware are extracted and used as a basis for judging the maliciousness of the software to be detected. Malicious software belonging to different families has different malicious behaviors, and the characteristics expressed by the malicious behaviors are also greatly different. Malware of the same malware family have similar malicious behavior. However, existing malware detection tools are less capable of multi-tag detection of malware, such as McAfee, which detects malicious samples in Genome data sets, wherein more than 90% of the samples are detected as Trojan or Downloader, and actually belong to a plurality of different malware families (e.g., DroidDream). Therefore, the speed and the accuracy are both to be further improved, and an efficient malware multi-tag detection method needs to be researched.
Disclosure of Invention
The invention aims to provide an Android-oriented mobile network terminal malicious software multi-feature detection method, so that the features of Android malicious software are effectively extracted, the Android malicious software detection precision is improved, and the Android-oriented mobile network terminal malicious software multi-feature detection method has the Android malicious family classification capability.
The technical solution for realizing the invention is as follows: a mobile network terminal malicious software multi-feature detection method for Android specifically comprises the following steps:
step 1, obtaining Android malware samples, marking Android malware families to which the samples belong, and then obtaining non-malware samples so as to construct malicious and non-malware sample datasets;
step 2, extracting installation package characteristics of the software, comprising the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F;
step 3, processing an Android software sample by using a decompiling tool, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
step 4, decompiling the installation package, constructing a software function call graph, positioning a security sensitive method therein, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method;
step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
and 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and the feature library of the malicious software family to obtain the name of the malicious software family with the highest similarity, outputting the software as the malicious software if the similarity exceeds a threshold value, and outputting the malicious software family to which the software belongs, otherwise, outputting the software as benign software.
Compared with the prior art, the invention has the following remarkable advantages: 1) the invention provides an Android-oriented mobile network terminal malicious software multi-feature detection method, aiming at different malicious software families, analyzing software from three aspects of software package features, application authority features and software behavior calling features on the basis of a static analysis method; 2) according to the invention, a statistical analysis method is adopted to extract features of a malicious software family, a malicious software family feature library is constructed, and a malicious software multi-label detection method is provided based on the feature library, so that better malicious judgment precision and malicious family classification precision can be achieved.
The invention is explained in further detail below with reference to the drawings.
Drawings
Fig. 1 is a flowchart of a malicious software multi-feature detection method for an Android-oriented mobile network terminal according to the present invention.
FIG. 2 is a comparison of malware detection accuracy and malicious family classification accuracy with the partial engines in VirusTotal, using the present invention.
Detailed Description
With reference to the attached drawings, the Android-oriented mobile network terminal malicious software multi-feature detection method comprises the following steps:
step 1, obtaining Android malware samples, marking Android malware families to which the samples belong, and then obtaining non-malware samples so as to construct malicious and non-malware sample datasets;
step 2, extracting installation package characteristics of the software, comprising the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F;
the abnormal file refers to a file of which the suffix does not match with the type specified by the file content; judging whether the file exists or not, and judging whether the library file is a root extension file or not according to the MD5 value; and judging whether subprograms exist in the jar file, the dex file and the apk file.
Step 3, processing an Android software sample by using a decompiling tool, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
step 4, decompiling the installation package, constructing a software function call graph, positioning a security sensitive method therein, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method;
the security-sensitive method comprises: a method of authority protection, a Source/Sink method of information flow and other suspicious methods; the authority protection method refers to an API which can be used only when the authority needs to be applied in an Android system, the information flow Source/Sink method refers to a method which can possibly generate or send sensitive information, and other suspicious methods comprise a dynamic loading function, a reflection function, an encryption and decryption function, a Native code execution function and a calling function.
The constructed software function call graph is the following four-tuple:
SBG=(VD,VN,E,μ)
wherein, VDCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereind∈VDIs one of the security sensitive methods; vNCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereinn∈VNThe method is a non-security sensitive method, but directly or indirectly calls a security sensitive method; e is as large as VN×VDCalling a set of graph edges for software sensitive behaviors to indicate that methods have a calling relationship therebetween, wherein any one edge e ═ vn,vd) E represents a non-security-sensitive method v in softwaren∈VNDirectly or indirectly calling security sensitive method vd∈VDOr component CsMethod v of (1)nTriggering component C directly or indirectly through ICCtMethod v of (1)d(ii) a Marking function μ Vd→<ID, EntryType, Para > is used for marking the content contained in the node in the graph, namely the context information of the method, including the method ID, the entry point type EntryType and the parameter Para;
the set of sensitive behaviors is the set shown below:
SBS={S1,…,Si,…,Sm}
wherein S isi={v|(vi,v)∈E∧vi∈VN∧v∈VDThe method is a security sensitive method set, and a diagram SBG (SBG-V) for representing sensitive behaviors is calledD,VNIn E, mu), the set of all security sensitive methods directly or indirectly called by the ith non-sensitive security method of the VN set; m ═ VNAnd | is the length of the set SBS.
Step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
the constructed Android malicious software family multi-feature model is the following six-tuple:
M=(SBSc,α,Fc,β,Pc,γ)
wherein the content of the first and second substances,the method comprises the steps that a sensitive behavior set which is common to a malware family is obtained by statistically analyzing a sensitive behavior set SBS of a sample of the same malware family; marking functionFor marking SBScProbability of occurrence of the mesosensitive set of methods in the malware family sample; fcCounting the common software installation package characteristics of the obtained malicious software family samples by analyzing the installation package characteristic vector F of the same malicious software family sample; the marking function beta F belongs to Fc→[0,1]For marking FcThe probability of occurrence of various features in the malware family sample; pcThe method comprises the steps of counting an authority list frequently applied by a malicious software family sample by analyzing an authority list P of the same malicious software family sample; the marking function gamma is P belongs to Pc→[0,1]For marking PcThe probability of each privilege appearing in the malware family sample.
And 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and the feature library of the malicious software family to obtain the name of the malicious software family with the highest similarity, outputting the software as the malicious software if the similarity exceeds a threshold value, and outputting the malicious software family to which the software belongs, otherwise, outputting the software as benign software.
The similarity between the software to be tested and the malware family is represented as:
wherein SfSimilarity as software feature vectors,SpFor similarity of authority lists, SsbsSimilarity of sensitive behavior sets, μiThe weight value of each similarity in calculation;
software feature vector similarity SfThe calculation method comprises the following steps: giving a feature vector F ═ F of the software to be tested1,f2,f3,...,fmAnd f, feature vectors in the multi-feature model of the malware family to be matchedAnd the corresponding labeling function β, then:
calculating similarity according to the probability of each feature, wherein if the values in the feature vectors in the multi-feature model of the malicious family are all 0, the similarity is 0; wherein the correction factor omegafThe calculation method comprises the following steps: all of the variables F in the vector Fifi cNumber of features divided by vector FcA median number of 1 features;
permission list similarity S of softwarepThe calculation method comprises the following steps: giving an authority list P of the software to be tested and the authority list P in the multi-feature model of the malicious software family to be matchedc={p1 c,p2 c,...,pn cAnd the corresponding marking function γ, then:
wherein the correction factor omegapThe calculation method comprises the following steps: belonging to P in permission set PcIs divided by the set PcLength of (d); when authority list PcElement (1) ofWhen included in the permission list P of the software under test,the value is 1, otherwise 0;
sensitive behavior set similarity SsbsThe calculation method comprises the following steps: given the SBS, which is a sensitive behavior set for software, the set of sensitive behaviors in the multiconfeatures of the malware family to be matchedAnd the corresponding marking function α, then:
in the formula, ωsbsThe calculation method for the correction factor comprises the following steps: all in SBSSet S ofi cIs divided by the amount of SBS in the setcLength of (d); wherein the functionRepresents: there is a certain set S in SBS, and setThe proportion of similar elements in (1) to all elements in the two sets is larger than theta (0)<θ≤1)。
Therefore, the characteristics of the malicious software family are extracted by adopting a statistical analysis method, the malicious software family characteristic library is constructed, the malicious software multi-label detection method is provided based on the characteristic library, and the high malicious judgment precision and the malicious family classification precision can be achieved.
In order to make those skilled in the art better understand the technical problems, technical solutions and technical effects of the present invention, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
Examples
A multi-feature detection method for Android-oriented mobile network terminal malicious software uses a Drebin data set and non-malicious software samples obtained from Google Play to form a data set, and the malicious code detection and family classification specifically comprise the following steps:
step 1: dividing samples in Drebin according to a malicious family to which the samples belong, acquiring non-malicious software on Google Play by using a web crawler method, and verifying by using VirusTotal on-line detection service, thereby constructing a sample data set comprising 4486 malicious software samples of 24 malicious software families and 2140 benign software samples;
step 2: decompressing the software installation package to be analyzed by using a Zip decompression tool, and extracting the installation package characteristics of the software, wherein the method comprises the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F; when judging whether a file for a root system exists, comparing the MD5 value of the existing root extension library file with the file in the software installation package; judging whether an abnormal file exists, analyzing the file content through an Apache Tika tool to obtain the file type, and comparing the file type with a file suffix; judging whether a subprogram exists or not, and checking whether a jar file, a dex file and an apk file exist in the subprogram or not;
and step 3: processing an Android software sample by using an APKParser, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
and 4, step 4: the method comprises the steps of using a Soot tool to decompile an installation package, constructing a software function call graph, positioning a security sensitive method in the software function call graph, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method;
security sensitive methods of interest include: a method of authority protection, a Source/Sink method of information flow and other suspicious methods; the authority protection method refers to an API which can be used only when the authority needs to be applied in an Android system, the information flow Source/Sink method refers to a method which can possibly generate or send sensitive information, and other suspicious methods comprise a dynamic loading function, a reflection function, an encryption and decryption function, a Native code execution function and a calling function.
The constructed sensitive behavior call graph is the following four-tuple:
SBG=(VD,VN,E,μ)
wherein, VDCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereind∈VDIs one of the security sensitive methods; vNCalling a subset of the set of points in the graph for the software function, any node v thereinn∈VNThe method is a non-security sensitive method, but directly or indirectly calls a security sensitive method; e is as large as VN×VDAnd calling the collection of graph edges for sensitive behaviors to indicate that the methods have calling relations. Wherein any one side e ═ v (v)n,vd) E represents a non-security-sensitive method v in softwaren∈VNDirectly or indirectly calling security sensitive method vd∈VDOr component CsMethod v of (1)nTriggering component C directly or indirectly through ICCtMethod v of (1)d(ii) a Marking function μ Vd→<ID,EntryType,Para>The method is used for marking the content contained in the vertex in the graph and comprises a method ID, an entry point type EntryType and a parameter Para.
The set of sensitive behaviors is the set shown below:
SBS={S1,S2,…,Sm}
wherein S isi={v|(vi,v)∈E∧vi∈VN∧v∈VDThe method is a security sensitive method set, and a diagram SBG (SBG-V) for representing sensitive behaviors is calledD,VNIn E, μ) ofnThe set is formed by all security sensitive methods directly or indirectly called by the ith non-sensitive security method of the set; m ═ VNL is the length of the set SBS;
and 5, selecting 75 percent (3341 samples) of the 24 malware family samples as the samples for feature extraction, and constructing a malware family feature library. Performing statistical analysis on software features belonging to the same malware family in a malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
the constructed Android malware family multi-feature model is the following six-tuple:
M=(SBSc,α,Fc,β,Pc,γ)
wherein the content of the first and second substances,the method comprises the steps that a sensitive behavior set which is common to a malware family is obtained by statistically analyzing a sensitive behavior set SBS of a sample of the same malware family; marking functionFor marking SBScProbability of occurrence of the mesosensitive set of methods in the malware family sample; fcThe method comprises the steps that the common software installation package characteristics of the malicious software family samples are obtained through statistics by analyzing the installation package characteristics F of the same malicious software family sample; the marking function beta F belongs to Fc→[0,1]For marking FcThe probability of each feature in the malware family sample occurring; pcThe method comprises the steps of counting an authority list frequently applied by a malicious software family sample by analyzing an authority list P of the same malicious software family sample; the marking function gamma is P belongs to Pc→[0,1]For marking PcThe probability of each privilege appearing in a malware family sample;
step 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and a malicious software family feature library to obtain a malicious software family name with the highest similarity, outputting the software as malicious software if the similarity exceeds 0.7, and outputting the malicious software family to which the software belongs, otherwise, outputting the software as benign software;
the similarity between the software to be tested and the malware family is expressed as:
wherein SfIs the similarity of the feature vectors, SpFor similarity of authority lists, SsbsSimilarity of sensitive behavior sets, μiIn the experiment, three weight values are taken as the weight values of each similarity in calculation
The similarity calculation method of the software feature vector comprises the steps of giving the feature vector F of the software to be tested to be F ═ F1,f2,f3,...,fmAnd F, feature vector in the multi-feature model of the malware family to be matchedc={f1 c,f2 c,f3 c,...,fm cAnd the similarity of the corresponding labeling function β is calculated as follows:
and calculating the similarity according to the probability of the occurrence of each feature, wherein if the values of the feature vectors in the multi-feature model of the malicious family are all 0, the similarity is 0. Wherein the correction factor omegafThe calculation method comprises the following steps: all of the variables F in the vector Fifi cNumber of features divided by vector FcNumber of features with a median of 1.
The method for calculating the similarity of the software permission list comprises the steps of giving the permission list P of the software to be tested and giving the permission list P in the multi-feature model of the malicious software family to be matchedc={p1 c,p2 c,...,pn cAnd the similarity of the corresponding labeling function γ is calculated as follows:
wherein the correction factor omegapThe calculation method comprises the following steps: belonging to P in permission set PcIs divided by the set PcLength of (d).
The method for calculating the similarity of the sensitive behavior sets comprises the steps of giving the SBS of the sensitive behavior sets of the software and obtaining the sensitive behavior sets in the multi-features of the malicious software families to be matchedAnd the corresponding marking function alpha, and the calculation method of the similarity is shown as the following formula:
in order to prevent the more featured malware family from covering the less featured family, a correction factor omega is introducedsbsThe calculation method comprises the following steps: all in SBSSet of (2)Is divided by the amount of SBS in the setcLength of (d). Wherein the functionRepresents: there is a certain set S in SBS, and setThe proportion of similar elements in (a) to all elements in both sets is greater than 80%.
The remaining 25% (1145) malware samples and 2140 benign software samples were tested using the above method, and the results of the software maliciousness determination and malicious family classification are compared with the results of the 8 antivirus engines commonly found in VirusTotal as shown in fig. 2.
Therefore, the method selects the software package characteristics, the authority characteristics and the software sensitive behavior calling characteristics as the basis for judging the malicious software, can improve the accuracy of detecting the malicious behaviors of the software, and has the capability of classifying malicious software families.
Claims (5)
1. A mobile network terminal malicious software multi-feature detection method for Android is characterized by comprising the following steps:
step 1, obtaining Android malware samples, marking Android malware families to which the samples belong, and then obtaining non-malware samples so as to construct malicious and non-malware sample datasets;
step 2, extracting installation package characteristics of the software, comprising the following steps: so file, whether there is file for root system, whether there is abnormal file, and whether there is subprogram, thus constructing installation package feature vector F;
step 3, processing an Android software sample by using a decompiling tool, analyzing an Android Manifest xml file, and extracting an authority list P applied by software according to a mark field in xml;
step 4, decompiling the installation package, constructing a software function call graph, positioning a security sensitive method therein, constructing a sensitive behavior graph SBG of the software, then obtaining context information of the security sensitive method by adopting a data flow analysis method, and forming a sensitive behavior set SBS of the software by the directly or indirectly called security sensitive method; the constructed software function call graph is the following four-tuple:
SBG=(VD,VN,E,μ)
wherein, VDCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereind∈VDIs one of the security sensitive methods; vNCalling a subset of the set of points in the graph for software sensitive behavior, any node v thereinn∈VNThe method is a non-security sensitive method, but directly or indirectly calls a security sensitive method; e is as large as VN×VDCalling a set of graph edges for software sensitive behaviors to indicate that methods have a calling relationship therebetween, wherein any one edge e ═ vn,vd) E represents a non-security-sensitive method v in softwaren∈VNDirectly or indirectly calling security sensitive method vd∈VDOr component CsMethod v of (1)nTriggering component C directly or indirectly through ICCtMethod v of (1)d(ii) a Marking function μ Vd→<ID,EntryType,Para>For marking the contents of a dot inclusion in a graph, i.e. VDAnd VNThe context information of the method (1), including a method ID, an entry point type EntryType, and a parameter Para;
the set of sensitive behaviors is the set shown below:
SBS={S1,…,Si,…,Sm}
wherein S isi={v|(vi,v)∈E∧vi∈VN∧v∈VDThe method is a security sensitive method set, and a diagram SBG (SBG-V) for representing sensitive behaviors is calledD,VNIn E, μ) ofNThe set is formed by all security sensitive methods directly or indirectly called by the ith non-sensitive security method of the set; m ═ VNL is the length of the set SBS;
step 5, performing statistical analysis on software features belonging to the same malware family in the malicious sample to obtain the occurrence probability of each feature component, and constructing an Android malware family multi-feature model M so as to construct a malware family feature library;
and 6, extracting the features of the software to be tested by using the methods in the steps 2-4, performing feature matching on the features of the software to be tested and the malware family feature library to obtain a malware family name with the highest similarity, outputting the software to be tested as malware if the similarity exceeds a threshold value, and outputting the malware family to which the software to be tested belongs, otherwise, outputting the software to be tested as benign software.
2. The Android-oriented mobile network terminal malware multi-feature detection method as claimed in claim 1, wherein the abnormal file in step 2 refers to a file whose suffix does not match with a type specified by the file content itself; the method comprises the steps of judging whether a file exists or not, judging whether a library file is a rootextension file or not through an MD5 value; and judging whether subprograms exist in the jar file, the dex file and the apk file.
3. The Android-oriented mobile network terminal malware multi-feature detection method of claim 1, wherein the security-sensitive method in step 4 comprises: a method of authority protection, a Source/Sink method of information flow and other suspicious methods; the authority protection method refers to an API which can be used only when the authority needs to be applied in an Android system, the information flow Source/Sink method refers to a method which can possibly generate or send sensitive information, and other suspicious methods comprise a dynamic loading function, a reflection function, an encryption and decryption function, a Native code execution function and a calling function.
4. The Android-oriented mobile network terminal malware multi-feature detection method of claim 1, wherein the Android malware family multi-feature model constructed in step 5 is the following six-tuple:
M=(SBSc,α,Fc,β,Pc,γ)
wherein the content of the first and second substances,the method comprises the steps that a sensitive behavior set which is common to a malware family is obtained by statistically analyzing a sensitive behavior set SBS of a sample of the same malware family;marking functionFor marking SBScProbability of occurrence of the mesosensitive set of methods in the malware family sample; fcCounting the common software installation package characteristics of the obtained malicious software family samples by analyzing the installation package characteristic vector F of the same malicious software family sample; the marking function beta F belongs to Fc→[0,1]For marking FcThe probability of occurrence of various features in the malware family sample; pcThe method comprises the steps of counting an authority list frequently applied by a malicious software family sample by analyzing an authority list P of the same malicious software family sample; the marking function gamma is P belongs to Pc→[0,1]For marking PcThe probability of each privilege appearing in the malware family sample.
5. The Android-oriented mobile network terminal malware multi-feature detection method as claimed in claim 1, wherein the similarity between the software to be detected and a malware family in step 6 is represented as:
wherein SfIs the similarity of the software feature vectors, SpFor similarity of authority lists, SsbsSimilarity of sensitive behavior sets, μiThe weight value of each similarity in calculation;
software feature vector similarity SfThe calculation method comprises the following steps: giving a feature vector F ═ F of the software to be tested1,f2,f3,...,fmAnd F, feature vector in the multi-feature model of the malware family to be matchedc={f1 c,f2 c,f3 c,...,fm cAnd the corresponding labeling function β, then:
calculating similarity according to the probability of each feature, wherein if the values in the feature vectors in the multi-feature model of the malicious family are all 0, the similarity is 0; wherein the correction factor omegafThe calculation method comprises the following steps: all of the variables F in the vector Fifi cNumber of features divided by vector FcA median number of 1 features;
permission list similarity S of softwarepThe calculation method comprises the following steps: giving an authority list P of the software to be tested and the authority list P in the multi-feature model of the malicious software family to be matchedc={p1 c,p2 c,...,pn cAnd the corresponding marking function γ, then:
wherein the correction factor omegapThe calculation method comprises the following steps: belonging to P in permission set PcIs divided by the set PcLength of (d); when authority list PcElement (1) ofWhen included in the permission list P of the software under test,the value is 1, otherwise 0;
sensitive behavior set similarity SsbsThe calculation method comprises the following steps: given the SBS, which is a sensitive behavior set for software, the set of sensitive behaviors in the multiconfeatures of the malware family to be matchedAnd the corresponding marking function α, then:
in the formula, ωsbsThe calculation method for the correction factor comprises the following steps: all in SBSSet of (2)Is divided by the amount of SBS in the setcLength of (d); wherein the functionRepresents: there is a certain set S in SBS, and setThe proportion of the similar elements in the two sets to all elements is more than theta (theta is more than 0 and less than or equal to 1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810109044.6A CN108280350B (en) | 2018-02-05 | 2018-02-05 | Android-oriented mobile network terminal malicious software multi-feature detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810109044.6A CN108280350B (en) | 2018-02-05 | 2018-02-05 | Android-oriented mobile network terminal malicious software multi-feature detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108280350A CN108280350A (en) | 2018-07-13 |
CN108280350B true CN108280350B (en) | 2021-09-28 |
Family
ID=62807459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810109044.6A Active CN108280350B (en) | 2018-02-05 | 2018-02-05 | Android-oriented mobile network terminal malicious software multi-feature detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280350B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109167753A (en) * | 2018-07-23 | 2019-01-08 | 中国科学院计算机网络信息中心 | A kind of detection method and device of network intrusions flow |
CN110414234A (en) * | 2019-06-28 | 2019-11-05 | 奇安信科技集团股份有限公司 | The recognition methods of malicious code family and device |
CN110457009B (en) * | 2019-07-06 | 2023-04-14 | 天津大学 | Method for realizing software security requirement recommendation model based on data analysis |
CN110392056A (en) * | 2019-07-24 | 2019-10-29 | 成都积微物联集团股份有限公司 | A kind of the Internet of Things malware detection system and method for lightweight |
CN110516446A (en) * | 2019-08-26 | 2019-11-29 | 南京信息职业技术学院 | A kind of Malware family ownership determination method, system and storage medium |
CN110795732A (en) * | 2019-10-10 | 2020-02-14 | 南京航空航天大学 | SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal |
WO2021106173A1 (en) * | 2019-11-28 | 2021-06-03 | 日本電信電話株式会社 | Labeling device and labeling program |
CN111368297B (en) * | 2020-02-02 | 2023-02-28 | 西安电子科技大学 | Privacy protection mobile malicious software detection method, system, storage medium and application |
CN111460448B (en) * | 2020-03-09 | 2022-12-02 | 北京邮电大学 | Malicious software family detection method and device |
CN113378163A (en) * | 2020-03-10 | 2021-09-10 | 四川大学 | Android malicious software family classification method based on DEX file partition characteristics |
CN113591079B (en) * | 2020-04-30 | 2023-08-15 | 中移互联网有限公司 | Method and device for acquiring abnormal application installation package and electronic equipment |
CN112287345B (en) * | 2020-10-29 | 2024-04-16 | 中南大学 | Trusted edge computing system based on intelligent risk detection |
CN112632539B (en) * | 2020-12-28 | 2024-04-09 | 西北工业大学 | Dynamic and static hybrid feature extraction method in Android system malicious software detection |
KR102491451B1 (en) * | 2020-12-31 | 2023-01-27 | 주식회사 이스트시큐리티 | Apparatus for generating signature that reflects the similarity of the malware detection classification system based on deep neural networks, method therefor, and computer recordable medium storing program to perform the method |
CN112887328A (en) * | 2021-02-24 | 2021-06-01 | 深信服科技股份有限公司 | Sample detection method, device, equipment and computer readable storage medium |
CN113468532B (en) * | 2021-07-20 | 2022-09-23 | 国网湖南省电力有限公司 | Malicious software family inference method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440459A (en) * | 2013-09-25 | 2013-12-11 | 西安交通大学 | Function-call-based Android malicious code detection method |
CN104794051A (en) * | 2014-01-21 | 2015-07-22 | 中国科学院声学研究所 | Automatic Android platform malicious software detecting method |
CN105447388A (en) * | 2015-12-17 | 2016-03-30 | 福建六壬网安股份有限公司 | Android malicious code detection system and method based on weight |
CN107169351A (en) * | 2017-05-11 | 2017-09-15 | 北京理工大学 | With reference to the Android unknown malware detection methods of dynamic behaviour feature |
CN107180192A (en) * | 2017-05-09 | 2017-09-19 | 北京理工大学 | Android malicious application detection method and system based on multi-feature fusion |
CN107392021A (en) * | 2017-07-20 | 2017-11-24 | 中南大学 | A kind of Android malicious application detection methods based on multiclass feature |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI461952B (en) * | 2012-12-26 | 2014-11-21 | Univ Nat Taiwan Science Tech | Method and system for detecting malware applications |
-
2018
- 2018-02-05 CN CN201810109044.6A patent/CN108280350B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440459A (en) * | 2013-09-25 | 2013-12-11 | 西安交通大学 | Function-call-based Android malicious code detection method |
CN104794051A (en) * | 2014-01-21 | 2015-07-22 | 中国科学院声学研究所 | Automatic Android platform malicious software detecting method |
CN105447388A (en) * | 2015-12-17 | 2016-03-30 | 福建六壬网安股份有限公司 | Android malicious code detection system and method based on weight |
CN107180192A (en) * | 2017-05-09 | 2017-09-19 | 北京理工大学 | Android malicious application detection method and system based on multi-feature fusion |
CN107169351A (en) * | 2017-05-11 | 2017-09-15 | 北京理工大学 | With reference to the Android unknown malware detection methods of dynamic behaviour feature |
CN107392021A (en) * | 2017-07-20 | 2017-11-24 | 中南大学 | A kind of Android malicious application detection methods based on multiclass feature |
Non-Patent Citations (4)
Title |
---|
"AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context",;"AppContext: Differentiating Malicious and Benign Mobile App Beh;《2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence》;20150531;第303-313页 * |
"FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps";Steven Arzt等;《ACM SIGPLAN Notices》;20140630;第259-269页 * |
"一种Android恶意软件多标签检测方法";王军 等;《小型微型计算机系统》;20171031;第38卷(第10期);第2307-2311页,正文第1-6节、图1 * |
"基于敏感路径识别的安卓应用安全性分析方法";缪小川;《中国优秀硕士学位论文全文数据库 信息科技辑》;20161015(第2016-10期);第I138-10页,正文第2-3章 * |
Also Published As
Publication number | Publication date |
---|---|
CN108280350A (en) | 2018-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280350B (en) | Android-oriented mobile network terminal malicious software multi-feature detection method | |
Shijo et al. | Integrated static and dynamic analysis for malware detection | |
Park et al. | Deriving common malware behavior through graph clustering | |
Park et al. | Fast malware classification by automated behavioral graph matching | |
US8806641B1 (en) | Systems and methods for detecting malware variants | |
US8627478B2 (en) | Method and apparatus for inspecting non-portable executable files | |
Iwamoto et al. | Malware classification based on extracted API sequences using static analysis | |
Shhadat et al. | The use of machine learning techniques to advance the detection and classification of unknown malware | |
Zolkipli et al. | An approach for malware behavior identification and classification | |
US8108931B1 (en) | Method and apparatus for identifying invariants to detect software tampering | |
CN109586282B (en) | Power grid unknown threat detection system and method | |
US20200193031A1 (en) | System and Method for an Automated Analysis of Operating System Samples, Crashes and Vulnerability Reproduction | |
Ugarte-Pedrero et al. | Countering entropy measure attacks on packed software detection | |
US20200012793A1 (en) | System and Method for An Automated Analysis of Operating System Samples | |
US20160094574A1 (en) | Determining malware based on signal tokens | |
CN109255241B (en) | Android permission promotion vulnerability detection method and system based on machine learning | |
Zakeri et al. | A static heuristic approach to detecting malware targets | |
Lee et al. | Screening smartphone applications using malware family signatures | |
KR20120073018A (en) | System and method for detecting malicious code | |
Martinelli et al. | I find your behavior disturbing: Static and dynamic app behavioral analysis for detection of android malware | |
Pandey et al. | Performance of malware detection tools: A comparison | |
US11068595B1 (en) | Generation of file digests for cybersecurity applications | |
Ugarte-Pedrero et al. | Semi-supervised learning for packed executable detection | |
US9177146B1 (en) | Layout scanner for application classification | |
KR20110087826A (en) | Method for detecting malware using vitual machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |