CN105426762A

CN105426762A - Static detection method for malice of android application programs

Info

Publication number: CN105426762A
Application number: CN201510999378.1A
Authority: CN
Inventors: 尚凤军; 邓小林
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2015-12-28
Filing date: 2015-12-28
Publication date: 2016-03-23
Anticipated expiration: 2035-12-28
Also published as: CN105426762B

Abstract

The invention relates to a static detection method for malice of android application programs and belongs to the technical field of safety detection of application programs on an Android platform. According to the method, firstly, correlation analysis of permission characteristic attributes of the Android application programs is performed through calculation of a partial correlation coefficient, so that dimensionality reduction preprocessing of permission feature sets is realized; secondly, the permission feature sets after dimensionality reduction are subjected to correlation cluster redundancy removal through mutual information with a Cartesian product method, a threshold value is set, an overfitting phenomenon is avoided, a set Xnew of new classification permission feature sets is obtained, and accordingly, the permission feature sets after permission clustering are almost mutually independent; finally, a naive bayes classifier is established on the basis of permission clustering and is improved, so that correlation of classification decisions of the application programs is high, and the reliability of malice detection of the Android application programs is further improved.

Description

The static detection method that a kind of android application program is malicious

Technical field

The invention belongs to applications security detection technique field under Android platform, relate to the static detection method that a kind of android application program is malicious.

Background technology

Modern times are life and work rhythm fast, and make people have higher requirement to obtaining real-time information from network with service, mobile Internet arises at the historic moment.The safety problem of mobile Internet directly has influence on user and uses and trust to mobile Internet, is more related to the release of mobile Internet production capacity and the normal performance of positive value, more relates to the security industry of our country and whole Folk Information.So we will pay close attention to the new features of mobile Internet safety the moment in this information age, understand the most detailed mobile Internet Safety actuality, a series of contradictions that the moment is grasped and process in time causes because of mobile Internet safety problem.Want continuous Improvement and perfection mobile Internet safety overall architecture and for preventing safety problem from occurring the deployment done, analyze and monitoring the mobile Internet moment appearance flow malicious attack, the unhealthy unscientific information of disseminating.The safety of mobile Internet is guaranteed through measures such as continuous technical renovation, safe design improvement, mobile Internet security deployments, and appoint special messenger to monitor in real time, and adopt the technological means such as content information filtration, ensure the safety of mobile Internet, guarantee to bring a clean healthy development environment to mobile Internet.By means of the development of mobile Internet, the thing that computer must be used just can to accomplish before just can accomplishing with mobile phone now, makes people have great lifting to smart mobile phone demand.In mobile Internet safety, the secure communication problem of Android is also more and more concerned, and in November, 2007, Google has issued the intelligent mobile operating system Android that increases income based on linux kernel.This system has huge number of users and application market: from the display of Gartner statistics, and the third quarter in 2013, the sales volume of whole world smart mobile phone was 2.5 hundred million multiple stage, and wherein android system occupies 81.9%; And the number of applications of ending on Jin Android official application market GooglePlay January 8 in 2014 just reaches 1,030,000.

Data show, use the ratio of smart mobile phone also lower people in 2011, used the ratio of smart mobile phone just to reach 46% by 2012.According to the information displaying of HIS statistics, estimate that the share smart mobile phone in 2013 is shared in the market will reach 55%, the bright smart mobile phone of these tables of data is changing people's daily life style, becomes the competent assistant of a lot of people's live and work.

Smart mobile phone function is constantly improved and development, for people's daily life brings a lot of facilities, but also becomes the main target of various mobile phone viruses and malware attacks simultaneously.The fast development of smart mobile phone, the virus for smart mobile phone also increases with the quantity of vast scale.First Virus in Smart Phone Cabir is born in NOKIA base camp, and through the development of a few years, the virus for intelligent terminal has just occurred thousands of kinds.The operation system of smart phone of current main-stream has: SymbianOS, the iOS of apple, the Android of the Windowsphone of Microsoft, Google.Each system has self safe precaution measure a set of, because people are to the attention of mobile phone privacy information safety, analyzes existing operation system of smart phone safety standard, improves intelligent mobile phone system and takes precautions against the emphasis that virus behavior becomes research.

Summary of the invention

In view of this, the object of the present invention is to provide the static detection method that a kind of android application program is malicious, first the method carries out correlation analysis by calculating partial correlation coefficient to Android application program authority characteristic attribute, reaches and carries out the pretreated object of dimensionality reduction to authority feature set; Next utilizes mutual information and cartesian product method, carries out correlativity cluster de-redundancy, and sets threshold value, avoid the phenomenon of over-fitting to the authority feature set after dimensionality reduction, obtains the set X of new classification authority feature set with this _new, reaching between the authority feature set after authority cluster is almost the object of separate relation; Finally, on the basis after authority cluster, build Naive Bayes Classifier, and make improvements, reach and application program categorised decision correlativity can be made high, and then improve the reliability of the malicious detection of Android application program.

For achieving the above object, the invention provides following technical scheme:

The static detection method that a kind of android application program is malicious, in the method, decompiling is carried out to selected sample program and obtains AndroidManifest.xml file, extract the authority feature of this file, and dimensionality reduction pre-service is carried out to it, then authority cluster de-redundancy is carried out to the authority feature set mutual information after dimensionality reduction and cartesian product method, finally build Naive Bayes Classification Model on this basis, and the division of malicious grade is carried out to detected malicious application program.

Further, the method specifically comprises the following steps:

Step one: collect and create the Sample Storehouse of malicious application program and non-malicious application program, respectively decompiling process being carried out to its APK sample and obtain AndroidManifest.xml file, then extract the authority feature of this file, obtaining authority feature set;

Step 2: utilize the correlative relationship between Android authority characteristic attribute variable, correlativity wherein arbitrarily between Two Variables may be because the existence of the 3rd variable shows, the method of based on partial correlation coefficient, authority characteristic attribute being carried out to correlation analysis is adopted to this, dimensionality reduction pre-service is carried out to authority feature set;

Step 3: utilize based on Mutual Information Theory and cartesian product method, adopts the Naive Bayes Classification Model method of the improvement based on mutual information and cartesian product, carries out cluster de-redundancy to the authority feature set obtained after the pre-service of authority feature set dimensionality reduction;

Step 4: based on the set X of categorical attribute collection _newbuild Naive Bayes Classifier, prior probability is obtained by sample training, then judging whether detected Android application program has malicious with test set sample by calculating posterior probability, carrying out grade classification to having malicious Android application program by probabilistic method.

Further, in step 2, describedly based on partial correlation coefficient, the method that authority characteristic attribute carries out correlation analysis specifically to be comprised:

The method is first by the simple correlation coefficient between calculating two authority characteristic attribute variablees wherein Cov (x _i, x _j) be x _iwith x _jbetween covariance, x _iwith x _jbetween standard deviation, making correlation matrix R by calculating the simple correlation coefficient of gained, calculating | r in R| determinant _ii, r _ij, r _jjalgebraic complement A ⁱⁱ, A ^ij, A ^jj, then bring the partial correlation coefficient between feature permission attribute variable into

ρ (x_{i}, x_{j} | x_{1}, . . ., x_{i - 1}, x_{i + 1}, . . ., x_{j - 1}, x_{j + 1}, . . ., x_{n}) = \frac{- A^{ij}}{\sqrt{A^{ii} \sqrt{A^{jj}}}}

Formula calculates, according to the partial correlation coefficient obtained | ρ | value judge between authority characteristic attribute correlativity size, remove the authority characteristic attribute that correlativity is low, obtain the pretreated authority feature set of dimensionality reduction.

Further, in step 3, utilize based on Mutual Information Theory and cartesian product method, adopt the Naive Bayes Classification Model method of the improvement based on mutual information and cartesian product, carry out cluster de-redundancy to the authority feature set obtained after the pre-service of authority feature set dimensionality reduction, cluster de-redundancy model is as follows:

Cor (X_{i}, C) = \underset{Ai, c}{Σ} P (X_{i}, C) \log \frac{P (X_{i}, C)}{P (X_{i}) P (C)}

Cor (X_{i}, X_{j}) = \underset{Ai, c}{Σ} P (X_{i}, X_{j}) \log \frac{P (X_{i}, X_{j})}{P (X_{i}) P (X_{j})}

Wherein Cor (X _i, C) and represent authority characteristic attribute variable X _iand the degree of correlation between category attribute variable C, Cor (X _i, X _j) represent authority characteristic attribute variable X _iand X _jbetween the degree of correlation, account form is as follows:

1) authority characteristic attribute variable X after calculating pre-service _iwith the degree of correlation Cor (X of class variable C _i, C), form primitive attribute collection X-ori by order arrangement from big to small;

2) first attribute variable X-ori (1) in calculating X-ori and degree of correlation Cor (X-ori (1), the X of other attribute variable _j);

3) to other variable X in X-ori except X-ori (1) _jif, Cor (X-ori (1), X _j) > Cor (X _j, C), then think and this variable and X-ori (1) height correlation are added the relevant set of X-ori (1);

4) the cartesian product X of front m variable of X-ori (1) and relevant set thereof _new1x is added as new property set _new, from X-ori, delete all variablees of X-ori (1) and relevant set thereof simultaneously;

5) 2 are repeated) to 4), until till.

Further, in step 4, based on the set X of categorical attribute collection _newbuilding Naive Bayes Classifier, obtain prior probability by sample training, then judging whether detected Android application program has malicious, based on the set X of authority categorical attribute collection with test set sample by calculating posterior probability _newthe model building naive Bayesian with classification C is as follows:

P (C_{i} | X_{new}) = \frac{P (X_{new} | C_{i}) P {(C_{i})}^{α}}{P (X_{new})}

Wherein, count (X _k| C _i) represent at classification C _iauthority characteristic attribute X in sample _kthe number of times occurred, count (X _k) represent authority characteristic attribute X in sample _kthe number of times occurred, count (X) presentation class authority set set X _newthe number of middle authority feature set, α represents the influence degree of different rights characteristic attribute to classification, and has quantized the relation between authority characteristic attribute and its category attribute, X _newfor the set of the authority characteristic attribute collection of Android application program, C _ithe classification of Android application program, i.e. non-malicious application program and malicious application program two class, P (X _new) be constant for all classes, therefore comparing posterior probability is only need p (X _new| C _i) P (C _i) ^αit is maximum that can to judge whether application program has malicious;

To the authority feature set with malicious Android application program of gained, malicious grade classification is carried out to malicious Android application program, calculates malicious grade as follows:

T = Σ \frac{P_{v}}{P_{m}}

P_{v} = Π_{i = 1}^{n} P_{v} (X_{i})

P_{m} = Π_{i = 1}^{n} P_{n} (X_{i})

Wherein, P _vrepresent the probability that this sample to be tested application occurs in malicious program; P _mrepresent the probability that this sample to be tested application occurs in non-malicious program; P _v(X _i) represent the probability that i-th authority feature set occur in malicious program; P _n(X _i) represent the probability that i-th authority feature set occur in non-malicious program.

Beneficial effect of the present invention is: the present invention obtains its associated rights used by carrying out decompiling to Android application program sample, in order to the foundation of model-naive Bayesian below, present invention employs partial correlation coefficient and Controlling UEP is carried out to Android application program authority characteristic attribute, dimensionality reduction pre-service is carried out to authority characteristic attribute, then mutual information and cartesian product method is utilized to carry out correlativity cluster de-redundancy to the authority feature set after dimensionality reduction, obtain new classification authority feature set, because between the authority set after cluster, correlativity is very low, it is almost separate relation, therefore the condition that naive Bayesian attribute is separate is met, build Naive Bayes Classifier on this basis, application program categorised decision correlativity can be made high, in addition to naive Bayesian do to improve and improve further the malicious verification and measurement ratio of Android application program, the phenomenon that threshold value also can avoid over-fitting is set in cluster process, grade classification is carried out to malicious, this improves application program security when mounted in practical application, security before the present invention is used for the installation of following Android application software detects, reminding user application program whether can have malicious and malicious intensity and grade, the safety research that this application programs uses has profound significance and wide research.

Accompanying drawing explanation

In order to make object of the present invention, technical scheme and beneficial effect clearly, the invention provides following accompanying drawing and being described:

Fig. 1 is the schematic flow sheet of the method for the invention;

Fig. 2 carries out the pretreated schematic diagram of dimensionality reduction to authority feature;

Fig. 3 is the schematic diagram to dimensionality reduction pretreated authority feature clustering de-redundancy.

Embodiment

Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described in detail.

Fig. 1 is the schematic flow sheet of the method for the invention, as shown in the figure, the malicious static detection method of android application program of the present invention mainly comprises following four steps: step one: collect and create the Sample Storehouse of malicious application program and non-malicious application program, respectively decompiling process is carried out to its APK sample and obtain AndroidManifest.xml file, then extract the authority feature of this file, obtain authority feature set; Step 2: utilize the correlative relationship between Android authority characteristic attribute variable, correlativity wherein arbitrarily between Two Variables may be because the existence of the 3rd variable shows, a kind of method of based on partial correlation coefficient, authority characteristic attribute being carried out to correlation analysis is proposed to this, carry out dimensionality reduction pre-service to authority feature set, the method is first by the simple correlation coefficient between calculating two authority characteristic attribute variablees wherein Cov (x _i, x _j) be x _iwith x _jbetween covariance, x _iwith x _jbetween standard deviation, making correlation matrix R by calculating the simple correlation coefficient of gained, calculating | r in R| determinant _ii, r _ij, r _jjalgebraic complement A ⁱⁱ, A ^ij, A ^jj, then bring the partial correlation coefficient between feature permission attribute variable into

ρ (x_{i}, x_{j} | x_{1}, . . ., x_{i - 1}, x_{i + 1}, . . ., x_{j - 1}, x_{j + 1}, . . ., x_{n}) = \frac{- A^{ij}}{\sqrt{A^{ii} \sqrt{A^{jj}}}}

Formula calculates, according to the partial correlation coefficient obtained | ρ | value judge between authority characteristic attribute correlativity size, remove the authority characteristic attribute that correlativity is low, obtain the pretreated authority feature set of dimensionality reduction; Step 3: utilize based on Mutual Information Theory and cartesian product method, the Naive Bayes Classification Model method of a kind of improvement based on mutual information and cartesian product proposed, cluster de-redundancy is carried out to the authority feature set obtained after the pre-service of authority feature set dimensionality reduction, authority characteristic attribute variable X after (1) calculating pre-service _iwith the degree of correlation Cor (X of class variable C _i, C), form primitive attribute collection X-ori by order arrangement from big to small; (2) first attribute variable X-ori (1) in calculating X-ori and degree of correlation Cor (X-ori (1), the X of other attribute variable _j); (3) to other variable X in X-ori except X-ori (1) _jif, Cor (X-ori (1), X _j) > Cor (X _j, C), then think and this variable and X-ori (1) height correlation are added the relevant set of X-ori (1); (4) the cartesian product X of front m variable of X-ori (1) and relevant set thereof _new1x is added as new property set _new, from X-ori, delete all variablees of X-ori (1) and relevant set thereof simultaneously; (5) (2)-(4) are repeated, until till; Step 4: based on the set X of categorical attribute collection _newbuild Naive Bayes Classifier, prior probability is obtained by sample training, then judging whether detected Android application program has malicious with test set sample by calculating posterior probability, carrying out grade classification to having malicious Android application program by probabilistic method.

In step one, to collect and the Sample Storehouse of the malicious application program created and non-malicious application program carries out decompiling process respectively obtains AndroidManifest.xml file, extract its authority feature, obtain authority feature set;

Fig. 2 carries out the pretreated schematic diagram of dimensionality reduction to authority feature, in step 2, the method is analyzed the correlative relationship between authority characteristic attribute variable based on partial correlation coefficient by utilizing, carry out dimensionality reduction pre-service to authority characteristic attribute, the method model analyzing correlativity between authority characteristic attribute is as follows:

r (x_{i}, x_{j}) = \frac{Cov (x_{i}, x_{j})}{\sqrt{D (x_{i}) D (x_{j})}}

ρ (x_{i}, x_{j} | x_{1}, . . ., x_{i - 1}, x_{i + 1}, . . ., x_{j - 1}, x_{j + 1}, . . ., x_{n}) = \frac{- A^{ij}}{\sqrt{A^{ii} \sqrt{A^{jj}}}}

A ^ij＝(-1) ^i+jM ^ij

Wherein r (x _i, x _j) be simple correlation coefficient; Cov (x _i, x _j) be x _iwith x _jbetween covariance; x _iwith x _jbetween standard deviation; A ⁱⁱ, A ^ij, A ^jjfor to be made matrix R by simple correlation coefficient | r in R| determinant _ii, r _ij, r _jjalgebraic complement; M ^ijn rank determinants | the complementary minor of R|, namely remove n rank determinant | in R|, after the i-th row jth row, remaining n-1 rank determinant is M ^ij.By calculating the simple correlation coefficient between two authority characteristic attribute variablees the simple correlation coefficient calculating gained is made correlation matrix R, calculates | r in R| determinant _ii, r _ij, r _jjalgebraic complement A ⁱⁱ, A ^ij, A ^jj, then bring the partial correlation coefficient between feature permission attribute variable into

ρ (x_{i}, x_{j} | x_{1}, . . ., x_{i - 1}, x_{i + 1}, . . ., x_{j - 1}, x_{j + 1}, . . ., x_{n}) = \frac{- A^{ij}}{\sqrt{A^{ii} \sqrt{A^{jj}}}}

Fig. 3 is the schematic diagram to dimensionality reduction pretreated authority feature clustering de-redundancy, in step 3, utilize based on Mutual Information Theory and cartesian product method, the Naive Bayes Classification Model method of a kind of improvement based on mutual information and cartesian product proposed, carry out cluster de-redundancy to the authority feature set obtained after the pre-service of authority feature set dimensionality reduction, cluster de-redundancy model is as follows:

Cor (X_{i}, C) = \underset{Ai, c}{Σ} P (X_{i}, C) \log \frac{P (X_{i}, C)}{P (X_{i}) P (C)}

Cor (X_{i}, X_{j}) = \underset{Ai, c}{Σ} P (X_{i}, X_{j}) \log \frac{P (X_{i}, X_{j})}{P (X_{i}) P (X_{j})}

1) each authority characteristic attribute variable X is calculated _iwith the degree of correlation Cor (X of class variable C _i, C), form primitive attribute collection X-ori by order arrangement from big to small;

5) (2)-(4) are repeated, until till.

In step 4, based on the set X of categorical attribute collection _newbuilding Naive Bayes Classifier, obtain prior probability by sample training, then judging whether detected Android application program has malicious, based on the set X of categorical attribute collection with test set sample by calculating posterior probability _newthe model building naive Bayesian with classification C is as follows:

P (C_{i} | X_{new}) = \frac{P (X_{new} | C_{i}) P {(C_{i})}^{α}}{P (X_{new})}

Wherein, count (X _k| C _i) represent at classification c _iauthority characteristic attribute X in sample _kthe number of times occurred, count (X _k) represent authority characteristic attribute X in sample _kthe number of times occurred, count (X) presentation class authority set set X _newthe number of middle authority feature set, α represents the influence degree of different rights characteristic attribute to classification, and has quantized the relation between authority characteristic attribute and its category attribute, X _newfor the set of the authority characteristic attribute collection of Android application program, C _ithe classification of Android application program, i.e. non-malicious application program and malicious application program two class, P (X _new) be constant for all classes, therefore comparing posterior probability is only need P (X _new| C _i) P (C _i) ^αit is maximum that can to judge whether application program has malicious.

T = Σ \frac{P_{v}}{P_{m}}

P_{v} = Π_{i = 1}^{n} P_{v} (X_{i})

P_{m} = Π_{i = 1}^{n} P_{n} (X_{i})

What finally illustrate is, above preferred embodiment is only in order to illustrate technical scheme of the present invention and unrestricted, although by above preferred embodiment to invention has been detailed description, but those skilled in the art are to be understood that, various change can be made to it in the form and details, and not depart from claims of the present invention limited range.

Claims

1. the static detection method that an android application program is malicious, it is characterized in that: in the method, decompiling is carried out to selected sample program and obtains AndroidManifest.xml file, extract the authority feature of this file, and dimensionality reduction pre-service is carried out to it, then authority cluster de-redundancy is carried out to the authority feature set mutual information after dimensionality reduction and cartesian product method, finally build Naive Bayes Classification Model on this basis, and the division of malicious grade is carried out to detected malicious application program.

2. the static detection method that a kind of android application program according to claim 1 is malicious, is characterized in that: the method specifically comprises the following steps:

Step 4: the set Xnew based on categorical attribute collection builds Naive Bayes Classifier, prior probability is obtained by sample training, then judging whether detected Android application program has malicious with test set sample by calculating posterior probability, carrying out grade classification to having malicious Android application program by probabilistic method.

3. the static detection method that a kind of android application program according to claim 2 is malicious, is characterized in that: in step 2, describedly specifically comprises the method that authority characteristic attribute carries out correlation analysis based on partial correlation coefficient:

The method is first by the simple correlation coefficient between calculating two authority characteristic attribute variablees wherein Cov (x _i, x _j) be x _iwith x _jbetween covariance, x _iwith x _jbetween standard deviation, making correlation matrix R by calculating the simple correlation coefficient of gained, calculating | r in R| determinant _ii, r _ij, r _jjseveral complementary minor A ⁱⁱ, A ^ij, A ^jjthen the partial correlation coefficient between feature permission attribute variable is brought into

ρ (x_{i}, x_{j} | x_{1}, ..., x_{i - 1}, x_{i + 1}, ..., x_{j - 1}, x_{j + 1}, ..., x_{n}) = \frac{- A^{i j}}{\sqrt{A^{i i} \sqrt{A^{j j}}}}

4. the static detection method that a kind of android application program according to claim 2 is malicious, it is characterized in that: in step 3, utilize based on Mutual Information Theory and cartesian product method, adopt the Naive Bayes Classification Model method of the improvement based on mutual information and cartesian product, carry out cluster de-redundancy to the authority feature set obtained after the pre-service of authority feature set dimensionality reduction, cluster de-redundancy model is as follows:

C o r (X_{i}, C) = \underset{A i, c}{Σ} P (X_{i}, C) \log \frac{P (X_{i}, C)}{P (X_{i}) P (C)}

C o r (X_{i}, X_{j}) = \underset{A i, c}{Σ} P (X_{i}, X_{j}) \log \frac{P (X_{i}, X_{j})}{P (X_{i}) P (X_{j})}

4) the cartesian product X of front m variable of X-ori (1) and relevant set thereof _newx is added as new property set _new, from X-ori, delete all variablees of X-ori (1) and relevant set thereof simultaneously;

5) 2 are repeated) to 4), until till.

5. the static detection method that a kind of android application program according to claim 2 is malicious, is characterized in that: in step 4, based on the set X of categorical attribute collection _newbuilding Naive Bayes Classifier, obtain prior probability by sample training, then judging whether detected Android application program has malicious, based on the set X of authority categorical attribute collection with test set sample by calculating posterior probability _newthe model building naive Bayesian with classification C is as follows:

P (C_{i} | X_{n e w}) = \frac{P (X_{n e w} | C_{i}) P {(C_{i})}^{n}}{P (X_{n e w})}

T = Σ \frac{P_{v}}{P_{m}}

P_{v} = Π_{i = 1}^{n} P_{v} (X_{i})

P_{m} = Π_{i = 1}^{n} P_{n} (X_{i})