CN104866763A

CN104866763A - Permission-based Android malicious software hybrid detection method

Info

Publication number: CN104866763A
Application number: CN201510282507.5A
Authority: CN
Inventors: 李晓红; 赵仁; 焦浩峰; 胡静; 许光全
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2015-05-28
Filing date: 2015-05-28
Publication date: 2015-08-26
Anticipated expiration: 2035-05-28
Also published as: CN104866763B

Abstract

The invention discloses a permission-based Android malicious software hybrid detection method. The method comprises the following steps: steps one, decompiling an Android application program and obtaining application program application permissions; step two, combining a system setting permission to carry out permission detection on the application program application permissions; dividing all applications to be detected into a kind application set, a malicious application set and a suspicious application set according to the difference of the conditions of the application program application permissions; step three, dynamically acquiring and detecting the behaviors of the application programs in the suspicious application set, collecting interface calling related to sensitive applications, giving vector space representation, and performing application program vectorization; step four, obtaining the detection result of kind application programs meeting safety detection standard through safety detection. Compared with the prior art, the permission-based Android malicious software hybrid detection method integrates two affecting factors of euclidean distance and cosine similarity, and the obtained detection result is more comprehensive and higher in accuracy.

Description

Based on the Android malware mixing detection method of authority

Technical field

The software security that the present invention relates to computer network and computer security detects and the field such as mobile terminal safety, particularly a kind of fairness of secure exchange agreement and the checking of non-repudiation.

Background technology

Along with developing rapidly of mobile communication technology and mobile hardware equipment, people are more and more stronger to the dependence of smart mobile phone in daily life work, and therefore the market share of Android increases rapidly.As the mobile terminal intelligent operating system of main flow, Android allows user by downloading and install third-party application to meet consumers' demand.But, because third party market lacks supervision and management, cause the continuous increase of Android platform Malware and mutation quantity thereof.The security of this phenomenon to Android platform constitutes huge threat.

The increasing rapidly of the rising of the Android market share and Android malware quantity makes to carry out research to Android malware analysis and resolution and is significant.And the deficiency of android system design itself reflects the necessity of this research further.

In recent years, the analysis and resolution for mobile terminal Malware has become a very important part in security study, and researcher has done large quantifier elimination in this respect.Present stage is mainly divided into based in authority and Behavior-based control these two about the safety research of Android application program.Rights management is carried out managing and detecting mainly for the authority of application application; The security of Behavior-based control detects the behavioral characteristics that mainly embodies in operational process with application program for foundation, in conjunction with other data analysing method, provides judgement to the security of application.Detection method based on authority has feature fast and efficiently in some cases, but undesirable for feature unconspicuous application Detection results; Behavior-based detection has that information acquisition amount is large, analytical approach improves feature accurately, but testing result may because information covers do not cause wrong report comprehensively, and it is not very high for significantly applying detection time long, efficiency for feature.

The present invention is directed to above-mentioned present situation, propose a kind of method of the hybrid detection based on authority.First according to the level of security of application software application authority, Preliminary detection is carried out to application, good will application and malicious application can be detected; Secondly, follow the tracks of behavior when suspicious application runs, collect the interface interchange relevant to sensitive permission, provide space vector and represent, and by the proper vector of TF-IDF algorithm computing application, adopt the detection that the detection method such as compute euclidian distances and cosine similarity realizes suspicious application.The contrast of experimental result and other work shows that the method for this patent improves the accuracy rate of Android malware detection really.

Summary of the invention

In order to overcome the problem of above-mentioned prior art, the present invention proposes a kind of Android malware mixing detection method based on authority, to detect for the purpose of Android mobile terminal application security, proposing one can the mixing detection method of analysis & verification Malware, and achieves this testing tool.

The present invention proposes a kind of Android malware mixing detection method based on authority, the method comprises and not living:

Step one, decompiling is carried out to Android application program, the program that is applied application authority;

Step 2, coupling system setting authority application programs application authority carries out authority detection; According to the difference of application program authority situation, all application to be detected are divided into good will application sets, malicious application collection and suspicious application sets;

Step 3, Dynamic Acquisition carry out detection of dynamic for the application behavior in suspicious application sets, collect the interface interchange relevant with sensitive application, provide vector space and represent, and carry out application program vectorization;

Step 4, to detect through security, obtain and meet the testing result of " the good will application program " of security examination criteria.

Compared with prior art, these two aspects of safety research of the safety research based on authority of Android application program and Behavior-based control combine and can maximize favourable factors and minimize unfavourable ones to a certain extent by the present invention, make the detection of significantly applying feature have feature fast, also had both behavioral value analytical approach simultaneously and improved and feature accurately.Target of the present invention detects whether Android application software is Malware.

Accompanying drawing explanation

Fig. 1 is the Android malware mixing detection method overall flow figure based on authority;

Fig. 2 is common sensitive permission sample statistics figure;

Fig. 3 is the testing result comparison diagram of the present invention and MDBC method.

Embodiment

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail, if these embodiments exist exemplary content, should not be construed to limitation of the present invention.

The present invention proposes a kind of hybrid detection framework based on authority: 1) first carry out Preliminary detection according to the authority of application application, detect good will application and malicious application; Then follow the tracks of the behavior of suspicious application, collect the interface interchange relevant to sensitive permission and detect, and then determine application type; 2) vector space model is introduced.Time suspicious application is detected, according to the sensitive information collected, algebraization is carried out to application, introduce vector space model and represent application; 3) Euclidean distance and cosine similarity method is adopted.The vector space compute euclidian distances that upper use is drawn also is carried out to cosine similarity and is compared, and suspect application programs is categorized as good will and malicious application the most at last.

According to the security of Android application program, Android application program is divided three classes: good will application, malicious application and suspicious application.Good will application refers to the application in use can not implementing malicious act to mobile phone and private data.Malicious application refers to the application in use implementing malicious act to mobile phone and private data.Suspicious application refers to the indefinite application of current security type, may implement malicious act in the process used to mobile phone and private data, also may implement malicious act to mobile phone and private data.

After three classes on done Android application program divide, design the overall flow of the mixing detection method based on authority of the present invention as shown in Figure 1.Testing process mainly comprises four steps: authority detects, dynamic behaviour obtains, will apply vectorization and security detects.Each step completes different functions, and four cooperate mutually, finally completes and detects the security of application.This flow process specifically describes: carry out decompiling to Android application program, the program that is applied application authority (generation list); Coupling system setting authority application programs application authority carries out authority filtration (namely to first doing Preliminary detection according to the authority of application application); According to the difference of application program authority situation, all application to be detected are divided into good will application sets, malicious application collection and suspicious application sets; So far, the detection of the application that good will application sets and malicious application are concentrated completes, the detection of dynamic after not needing to carry out.Dynamic Acquisition carries out detection of dynamic for the application behavior in suspicious application sets, collect the interface interchange relevant with sensitive application, provide vector space to represent, and carry out application vectorization (calculating the proper vector of application), eventually pass security to detect, provide " good will application program " this testing result meeting security examination criteria.

Four markingoff pins below compare detailed introduction to four detecting steps.

One, authority detects

In android system, if application program will complete certain behavior, just first to file corresponding authority must be obtained.Otherwise application program cannot call the API corresponding with this authority, application program is caused to complete the behavior.So, for the Preliminary detection of application program, the detected rule based on authority can be designed, the application program (safety application program and applied for the application program of system-level authority) with obvious characteristic can be detected by these rules.State in the authority information of the application program AndroidManifest.xml file in application source code bag, by decompiling instrument, the AndroidManifest.xml file of application program can be obtained, and then the authority information of application program can be obtained.

Android system itself provides hundreds of authority of four kinds of level of securitys.Four kinds of level of securitys are: Normal, Dangerous, Signature and SignatureOrSystem.These authorities are divided into 12 classes, such as access location information, accesses network and access personal information etc.In order to test needs, the authority of Normal and SignatureOrSystem group deposited separately, the authority of Dangerous and Signature group is deposited according to category classification.

Definition set Apper={per _i, per ₂..., per _nrepresent the authority set that application is applied for, per _ithe authority of (1≤i≤n) representative application application;

The set A ndper={perSet of all authorities of definition Android ₁, perSet ₂..., perSet ₁₄, perSet _i(1≤i≤12) represent the set of sensitive permission information in each classification, perSet ₁₃represent the set of Normal group authority information, perSet ₁₄represent the set of SignatureOrSystem group authority information.

The rule that static rights detects can be expressed as:

If 1 AppPer ∩ perSet ₁₃=AppPer, application can be judged as good will application;

If 2 AppPer ∩ perSet ₁₄≠ φ, application can be judged as malicious application;

If 3 AppPer ∩ perSet _i≠ φ, 1≤i≤12, the sensitive permission of the i-th class has been applied in application, may there is the deliberate threat belonging to such.

Use above-mentioned filtering rule can realize the preliminary classification of application program, the authority filtering out those applications is all the application program of Normal group authority and the application program of having applied for SignatureOrSystem group authority.For the application of application sensitive permission, the type of applying for sensitive permission can also be determined, a statistics can be done to the application situation of sensitive permission, have a preliminary understanding to the feature of sample.

Two, dynamic behaviour obtains

Apply for that because not all the Android application program of sensitive permission is all malice, so need to detect to authority the suspicious sample obtained to carry out further behavioral value.According to the behavioural characteristic that application program shows when running, bonding behavior characteristic detection method can realize the final classification to suspect application programs.The work that dynamic behaviour obtains collects behavioural characteristic when application program is run, for behavioral value afterwards provides data.

The accuracy of behavioral value depends on the completeness of the behavior sequence feature of the application got to a great extent.Comprise various assembly in Android application, a series of interface interchange can be triggered by assembly.In order to the behavioural characteristic collecting suspicious application as much as possible, in simulator when installation and operation application, use monkeyrunner that all component of application is run one time.This testing tool can send sequence of events stream to application, obtains the behavioural characteristic be applied in when receiving various event.

Behavioural information when application program is run has certain embodiment at the every aspect of android system.Consider legibility and the operability of behavioural information, the present invention is mainly through obtaining the method call information of Framework and native layer, by android system itself Log mechanism and DroidBox can obtain application program in this two-layer behavioural information, these information can reflect the behavioural characteristic of application more accurately.Because DroidBox can catch the method call of native layer, so some malicious act walking around Framework layer also can be captured to, be conducive to the completeness improving behavioural information.

When the operation obtaining application program after interface message, use shell-command can show the behavioural information that gets intuitively or behavioural information be saved in a text with for further analysis.Next, needing the interface message to obtaining to filter, retaining the interface message corresponding with sensitive permission.In order to the corresponding relation of the function interface that defines the competence, matched interfaces and authority information, can add up with security-related interface interchange information in application programs operational process according to this corresponding relation.

Three, vectorization is applied

For method set ε: the={ f of the interface interchange information architecture application program obtained ₁, f ₂, f ₃... f _i... f _n.Wherein, f _i(1≤i≤n) represents i-th interface of this application call, n represents total number of the interface that all Dangerous and Signature authorities are corresponding in the daily record of collecting, each application program can represent with a ε, introduce C and represent the set be made up of ε, represent the set of suspect application programs.

Definition w _i,jat application ε _jin method f _ithe number of times occurred.If f _ido not occur, then define w _i,j=0.Like this, ε _jjust vector form can be expressed as: ε _j={ w _{1, j}, w _{2, j}, w _{3, j}..., w _n,j.

In order to represent these information, introduce vector space model (Vector Space Model (VSM)), these should can be used as Algebraic Expression like this, each component in vector is non-negative.That the coordinate points (i, j) in VSM represents is application ε _jmethod f _iinformation.

In order to obtain the proper vector of each application, need to calculate weight to each method in application.Adopt TF-IDF algorithm herein, in the algorithm, for application ε _jin method f _i, the computing method of weight weight (i, j) are:

Weight (i, j)=tf _i,jidf _iformula (5-1)

Wherein, tf _i,jrepresent application ε _jin method f _ithe frequency occurred, idf _imethod for expressing f _iinverse document frequency.Eigenvector algorithm is described below:

Input: the vector representation ε of application to be detected _jwith the set C of the vector representation of one group of application;

Export: the proper vector of application to be detected.

Start:

Variable declarations:

Sum: ε _jmiddle digits sum;

NumberOfApps: the sum of the element that set C comprises;

Count: calculate the sum comprising the application of certain method in C, initial value is 0;

sum＝w _1,j+w _2,j+…+w _n,j；

numOfApps←|C|；

For ε _jin each method f _i, calculate weight (i, j) as follows:

{tf}_{i, j} = (double) (\frac{w_{i, j}}{sum})

For each vector representation ε in C _j, computing method f _iinverse document frequency idf _i, concrete computation process is: the quantity numberOfApps of vector in set of computations C, adds up f simultaneously _ithe number of times whether occurred in each vector in set C obtains count, by the total numberOfApps of vector divided by the vector sum count comprised, obtains business and takes the logarithm;

weight(i,j)＝tf _i,j·idf _i

Finally obtain ε _jproper vector

Export the proper vector of application program to be detected

The coded representation of this computation process is as follows:

if(w _i,j！＝0)

count++；

{idf}_{i} = \log ((double) (\frac{numberOfApps}{count}));

weight(i,j)＝tf _i,j*idf _i

Finally obtain ε _jproper vector

Output characteristic vector

Terminate.

Four, safety detecting method

For the application program that security type is known, the expression of their proper vector in vector space can be determined, these vectors using by as calculate and classify basis.When a UNKNOWN TYPE application program is detected time, need the distance that the proper vector calculating it is applied with good will, malice two class.According to result of calculation, classification is realized to this application.When calculating distance, introduce the method that two kinds calculate distance:

Euclidean distance: the length of line segment between this index expression two points, computing method are:

d (x, y) = \sqrt{Σ {(x_{i} - y_{i})}^{2}}

Formula (5-2)

Here, x is first point, and y is second point, and x _iand y _ithe value of i-th coordinate of first point and second point respectively.

Cosine similarity: this index can evaluate two vectorial similarity degrees by calculating two vectorial angle cosine.Experiment still adopts the title of " distance ", and uses 1-cosSimiliarity as the computing method of distance:

d (x, y) = 1 - \cos θ = 1 - \frac{u \cdot v}{| u | \cdot | v |}

Formula (5-3)

Here, u and v is the vector representing x and y respectively, and θ represents the angle between these two vectors, and uv represents these two vectorial inner products, | u| and | v| represents two vectorial length respectively.The span of d (x, y) is from 0 to 1, and 1 represents that two vectors are completely dissimilar, and 0 represents that two vectors have high similarity.

In order to carry out comprehensive evaluation to an application, the method for comprehensive three kinds of final distances of calculating is minor increment, mean distance and ultimate range respectively.By these three kinds of disposal routes, provide the overall assessment that is applied to good will application and malicious application distance, then according to the size of distance, judgement is provided to the classification belonging to application.

After these three range indexs calculating application to be measured, adopt the standard as classification that numerical value in similar index is little.Such as: as MaxLen [0] <MaxLen [1], application to be detected is judged as good will application; Otherwise application to be detected is judged as malicious application.

Distance detection algorithm is described below:

Input: the vectorial δ of the application to be detected and proper vector set Set={ δ of the control group ₁, δ ₂..., δ _m;

Export: application to be detected divides the result of calculation of three distances being clipped to good will and malice control group;

Pre-service: the proper vector set of control group is divided into according to good will application and malicious application: Set _ben={ δ _b1, δ _b2..., δ _bjand Set _mal={ δ _m1, δ _m2..., δ _mk, wherein b represents good will application program, and m represents malicious application, and j represents the quantity of good will application program, and k represents the quantity of malicious application.

Start:

Variable declarations: distType: represent distance type of detection: 1 represents Euclidean distance, and 2 represent cosine similarity;

switch(distTpye):

Case 1 (calculating Euclidean distance):

For δ=(x ₁, x ₂..., x _n) and δ _bi=(y ₁, y ₂..., y _n) ∈ Set _ben, calculate:

DisToBen [i] = Math . sqrt (Σ_{i = 1}^{n} {(x_{i} - y_{i})}^{2});

For δ=(x ₁, x ₂..., x _n) and δ _mi=(y ₁, y ₂..., y _n) ∈ Set _mal, calculate:

DisToMal [i] = Math . sqrt (Σ_{i = 1}^{n} {(x_{i} - y_{i})}^{2});

break；

Case 2 (calculating Euclidean distance): n represents the number of element in this proper vector, i represents the position of this proper vector in set, j represents the quantity of good will application program, k represents the quantity of malicious application, DisToBen [i] represents the distance detecting i-th good will proper vector in proper vector and set, the distance of i-th malice proper vector during DisToMal [i] expression detects proper vector and gathers:

DisToBen [i] = 1 - Math . \cos ((Σ_{i = 1}^{n} x_{i} * y_{i}) / (\sqrt{Σ_{i = 1}^{n} x_{i}^{2}} * \sqrt{Σ_{i = 1}^{n} y_{i}^{2}}));

DisToMal [i] = 1 - Math . \cos ((Σ_{i = 1}^{n} x_{i} * y_{i}) / (\sqrt{Σ_{i = 1}^{n} x_{i}^{2}} * \sqrt{Σ_{i = 1}^{n} y_{i}^{2}}));

break；

MaxLen[0]＝maxDist(DisToBen[],j),AvgLen[0]＝avgDist(DisToBen[],j),

MinLen[0]＝minDist(DisToBen[],j)；

MaxLen[1]＝maxDist(DisToMal[],k),AvgLen[1]＝avgDist(DisToMal[],k),

MinLen[1]＝minDist(DisToMal[],k)；

Export the result calculating distance and obtain;

Algorithm terminates.

Five, evaluation measures

In order to make evaluation to the accuracy detected, researchers have proposed multiple evaluation measures.The scheme introduced is adopted to make evaluation to test method herein below.

First, we introduce definition below:

N _{ben → ben}: good will application is judged as the number of good will application; n _{ben → mal}: good will application is judged as the number of malicious application; n _{mal → ben}: malicious application is judged as the number of good will application; n _{mal → mal}: malicious application is judged as the number of malicious application.Like this, we provide accuracy and error rate is defined as follows:

Acc = \frac{n_{ben &RightArrow; ben} + n_{mal &RightArrow; mal}}{n_{ben &RightArrow; ben} + n_{mal &RightArrow; mal} + n_{ben &RightArrow; mal} + n_{mal &RightArrow; ben}}

Formula (5-4)

Err = \frac{n_{ben &RightArrow; mal} + n_{mal &RightArrow; ben}}{n_{ben &RightArrow; ben} + n_{mal &RightArrow; mal} + n_{ben &RightArrow; mal} + n_{mal &RightArrow; ben}}

Formula (5-5)

Similar, it is as follows that introducing defines FPR (false positive rate) and TPR (true positive rate):

FPR = \frac{n_{ben &RightArrow; mal}}{n_{ben &RightArrow; ben} + n_{ben &RightArrow; mal}}

Formula (5-6)

TPR = \frac{n_{mal &RightArrow; mal}}{n_{mal &RightArrow; ben} + n_{mal &RightArrow; mal}}

Formula (5-7)

This literary grace carrys out the accuracy income evaluation to testing result in this way, and this evaluation method has simply, effective feature.Operability is stronger in the application, is conventional experimental evaluation method.

Experimental situation

The experiment of this patent is mainly based on following environment: Ubuntu 13.04 operating system, Android 2.3 simulator, DroidBox 2.3, Python 2.7, Java 1.7.The automatic test work instruments such as monkeyrunner and monkey have also been used in the process of experiment.

Six, data set

In order to verify the validity of detection method, experiment adopts from Google Play application market, and 982 samples of third-party application market and Android Malware Genome Project are as data set.In order to determine the quantity of data centralization good will sample and malice sample, before experiment, with F-Secure, Avast, LBE and Kingsoft, safety detection is carried out to sample.In these detect, if there is more than two fail-safe software testing result to be maliciously, so sample will be judged as malice sample, otherwise is just judged to be good will sample.Malice sample is made to be mistaken for the probability of good will sample so extremely low.After safety detection, shown in the source obtaining experimental data and security statistical conditions table 1 thereof.

Experimental verification and result

(1) authority filter result

Authority according to the application of data centralization sample carries out Preliminary detection to sample.If the authority of the whole Normal of being groups of application application or applied for the authority of SignatureOrSystem group, application is understood and is directly judged to be good will or maliciously, and does not need the detection carrying out next step.Other application is divided into suspicious application, and these application are the objects detected further.After authority detects, the statistics of this three class of data centralization application is as shown in table 2.

In malicious application, have some malicious acts to be very common, the behaviors such as such as accesses network, contact person and information, these behaviors complete the support needing corresponding sensitive permission smoothly.Therefore, in order to understand the application situation of data centralization sample to this few class sensitive permission, the application situation for this few class sensitive permission carries out detecting and adding up, and obtains as shown in table 3.

In order to represent statistics more intuitively, Fig. 1 provides the histogram of the common sensitive permission application situation of data centralization.This figure illustrates the application situation of this six classes sensitive permission intuitively: what applications were maximum is the sensitive permission of accessing mobile phone state class, and minimum is then the sensitive permission of accessing SD card class.

(2) authority filter result is analyzed

Through filtering, in the data centralization that experiment adopts, the sample of 20.17% can determine security class, wherein has the good will application of 15.89% and the malicious application of 4.28%.Security classes of other application be can not determine, are divided into suspicious application, and this testing result compares and tallies with the actual situation, because the security of a lot of sample cannot be judged by static nature.Such as: normal application also may apply for the authority networked, but can not do and send the operation such as privacy information or malicious downloading.

Carry out adding up to the application situation of common sensitive permission and find: the sample number of data centralization application access mobile phone state, network, contact person and info class authority is many.Wherein, the sensitive permission more than a part of sample application two class.These suspicious samples need to carry out further behavioral value.And along with the variation of application and development, sensitive permission more than increasing application application two class.The security of these samples can finally must determined through behavioral value.

(3) behavioral value result

Test the method detection of suspicious application being adopted to cross validation.Detect remaining suspicious application through authority and be equally divided into 3 groups.We select one group as a control group in turn when detection, and the sample of this group can be divided into benevolent software and Malware two class by the detection of fail-safe software.We are by the sample of control group installation and operation on simulator, collect function call information when sample runs; Utilize the method for chapter 3 to carry out vectorization and characterization to sample, afterwards sample is expressed as the vector in space coordinates.We can obtain the proper vector set of good will sample and the proper vector set of malice sample by this method.These set of eigenvectors cooperations are the basis of classification, and samples of other groups realize detecting the security of application by the distance calculating two class samples in control group (good will sample and malice sample) and gather.

According to the proper vector feature of control group in experiment, select 10 respectively, 15,20 most representative behavioural characteristics are as the standard of classifying.Euclidean distance and cosine similarity is successively adopted to test as classification foundation.When one group carry out as a control group testing complete after obtain TPR and FPR that this group tests, finally TPR and FPR of three groups of experiments is averaged and obtains experimental result.Through detecting, acquired results is as shown in table 4 and table 5.

(4) behavioral value results contrast

From above-mentioned experimental result, the result that the testing result using cosine similarity to obtain can obtain than Euclidean distance is on the whole good, and accuracy rate is high.But this situation is not absolute, can see from table when adopting Euclidean distance and mean distance can be better than the testing result adopting cosine similarity and ultimate range to obtain as index as the result obtained when measurement index.This illustrates that the effect that the evaluation criterion chosen when application program detects plays at some time can be more important than detection method.

Experimental result shows, is better than the result of Euclidean distance calculating with cosine similarity as the result that distance calculating method obtains; Be better than as the result that measurement index obtains the result that minimum and maximum distance obtains with mean distance.The best effects of test method can reach the TPR of 91.2%, controls 2.1% by FPR simultaneously, and total precision reaches 95.8%, and Detection results is more satisfactory.In order to weigh experimental result, with the work of the people such as Suleiman Y.Yerima be called HDPA to such as showing 6:(context of methods, the method for Suleiman Y.Yerima is called MDBC)

In order to the comparative result of both displays more intuitively, represent that the Detection results of two kinds of detection methods compares with Fig. 2.Can find out intuitively in the drawings, detection method in this paper has just surmounted MDBC when the feature chosen is more than 15, and the FPR of context of methods is also significantly less than the method for contrast simultaneously, indicates the validity of context of methods.

(5) behavioral value interpretation of result

Comparative result can be found out by experiment, adopts cosine similarity and mean distance to be best as the effect obtained when classification foundation.FPR controls in very low level by the method while the high TPR of guarantee, reaches the target of the low rate of false alarm of high accuracy of experiment design itself.

Find after by analysis: adopt cosine similarity more can reflect the similarity between similar sample as criterion distance.Because malice sample of the same type has similar behavior expression, but method call number of times is not necessarily similar, this causes the space length between similar sample comparatively large, causes erroneous judgement.

The result of Euclidean distance and cosine similarity two kinds of detection meanss is compared and can find: Euclidean distance as the Detection results of criterion distance not necessarily than the difference of cosine similarity because the type difference calculating distance also can affect the accuracy of detection method to a certain extent.The optimal result that the present invention detects considers two kinds of influence factors and obtains.

Table 1, experiment sample statistical form

Table 2, authority testing result statistical form

Table 3, common sensitive permission application situation statistical form

The experimental result that table 4, Euclidean distance obtain as computing method

The experimental result that table 5, cosine similarity obtain as computing method

Comparing of the result of table 6, this experimental result and MDBC

Claims

1., based on an Android malware mixing detection method for authority, it is characterized in that, the method includes the steps of:

2. as claimed in claim 1 based on the Android malware mixing detection method of authority, it is characterized in that, the Rule Expression that in described step 2, authority detects is:

If AppPer ∩ is perSet ₁₃=AppPer, application is judged as good will application;

If AppPer ∩ is perSet ₁₄≠ φ, application is judged as malicious application;

If AppPer ∩ is perSet _i≠ φ, 1≤i≤12, the sensitive permission of the i-th class has been applied in application, may there is the deliberate threat belonging to such.

Wherein, Apper={per _i, per ₂..., per _nrepresent the authority set that application is applied for, per _ithe authority of (1≤i≤n) representative application application; Andper={perSet ₁, perSet ₂..., perSet ₁₄represent the set of Android all authorities, perSet _i(1≤i≤12) represent the set of sensitive permission information in each classification, perSet ₁₃represent the set of Normal group authority information, perSet ₁₄represent the set of SignatureOrSystem group authority information.

3., as claimed in claim 1 based on the Android malware mixing detection method of authority, it is characterized in that, described step 3 specifically comprises following process:

Installation and operation application program in simulator, uses monkeyrunner that all component of application program is run one time, sends sequence of events stream by this testing tool to application program, obtains application program receiving the behavioural characteristic when various event.

When the operation obtaining application program after interface message, shell-command is used to show the behavioural information that gets or behavioural information be saved in a text with for further analysis; Next, the interface message obtained is filtered, retains the interface message corresponding with sensitive permission; Matched interfaces and authority information, add up with security-related interface interchange information according in this corresponding relation application programs operational process.

4., as claimed in claim 1 based on the Android malware mixing detection method of authority, it is characterized in that, described step 3 specifically comprises following process:

Input, the vector representation ε of application program to be detected _jwith the set C of the vector representation of one group of application program;

Start:

Variable declarations:

Sum: ε _jmiddle digits sum;

NumberOfApps: the sum of the element that set C comprises;

sum＝w _1,j+w _2,j+…+w _n,j；

numOfApps←|C|；

For ε _jin each method f _i, calculate weight (i, j) as follows:

{tf}_{i, j} = (double) (\frac{w_{i, j}}{sum})

weight(i,j)＝tf _i,j·idf _i

Finally obtain ε _jproper vector

Export the proper vector of application program to be detected

5. as claimed in claim 1 based on the Android malware mixing detection method of authority, it is characterized in that, the distance that safety detection in described step 4 is applied with good will, malice two class by the proper vector calculating application behavior realizes, and specifically comprises following algorithm:

Input the vectorial δ of the application to be detected and proper vector set Set={ δ of the control group ₁, δ ₂..., δ _m;

Pre-service: the proper vector set of control group is divided into according to good will application program and malicious application: Set _ben={ δ _b1, δ _b2..., δ _bjand Set _mal={ δ _m1, δ _m2..., δ _mk; Wherein b represents good will application program, and m represents malicious application, and j represents the quantity of good will application program, and k represents the quantity of malicious application;

Variable declarations: distType represents distance type of detection: 1 represents Euclidean distance, and 2 represent cosine similarity;

Situation one, calculating Euclidean distance, wherein: n represents the number of element in this proper vector, i represents the position of this proper vector in set, j represents the quantity of good will application program, k represents the quantity of malicious application, DisToBen [i] represents the distance detecting i-th good will proper vector in proper vector and set, the distance of i-th malice proper vector during DisToMal [i] expression detects proper vector and gathers:

DisToBen [i] = Math . sqrt (Σ_{i = 1}^{n} {(x_{i} - y_{i})}^{2});

DisToMal [i] = Math . sqrt (Σ_{i = 1}^{n} {(x_{i} - y_{i})}^{2});

Situation two, calculating Euclidean distance:

DisToBen [i] = 1 - Math . \cos ((Σ_{i = 1}^{n} x_{i} * y_{i}) / (\sqrt{Σ_{i = 1}^{n} x_{i}^{2}} * \sqrt{Σ_{i = 1}^{n} y_{i}^{2}}));

DisToMal [i] = 1 - Math . \cos ((Σ_{i = 1}^{n} x_{i} * y_{i}) / (\sqrt{Σ_{i = 1}^{n} x_{i}^{2}} * \sqrt{Σ_{i = 1}^{n} y_{i}^{2}}));

MaxLen[0]＝maxDist(DisToBen[],j),AvgLen[0]＝avgDist(DisToBen[],j),

MinLen[0]＝minDist(DisToBen[],j)；

MaxLen[1]＝maxDist(DisToMal[],k),AvgLen[1]＝avgDist(DisToMal[],k),

MinLen [1]=minDist (DisToMal [], k); Export the result of calculation that application program to be detected divides three distances being clipped to good will and malice control group.