CN109145605A - A kind of Android malware family clustering method based on SinglePass algorithm - Google Patents
A kind of Android malware family clustering method based on SinglePass algorithm Download PDFInfo
- Publication number
- CN109145605A CN109145605A CN201810963865.6A CN201810963865A CN109145605A CN 109145605 A CN109145605 A CN 109145605A CN 201810963865 A CN201810963865 A CN 201810963865A CN 109145605 A CN109145605 A CN 109145605A
- Authority
- CN
- China
- Prior art keywords
- family
- feature
- malware
- software
- singlepass
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of Android malice family clustering method based on SinglePass algorithm, belongs to computer and information science technical field.The present invention carries out behavioural characteristic extraction to Android malware first, sensitive permission, system Action, system Category and the sensory system API Calls of the software is obtained, using these as the behavioural characteristic of the software.Then feature selecting processing is carried out with feature of the filtering type method to building, filters out feature of the relatively important feature as next step analysis.The judgment basis belonged to finally by the similarity between software for calculation and existing malice family as family.A similarity threshold is set in decision process first, if similarity be more than threshold value if choose and all existing families in the highest family of similarity as the Malware family belong to.It is on the contrary then the software is divided into new Malware family.The invention can effectively improve analysis efficiency compared to dynamic analysing method, find new software malice family, have biggish practical value.
Description
Technical field
The present invention relates to a kind of Android malice family clustering method based on SinglePass algorithm, belongs to computer
With information science technology field.
Background technique
Today that mobile Internet becomes increasingly popular, mobile terminal become everyone indispensable a part of life.
Android is because its good open and flexible ecosystem becomes the maximum mobile terminal device of the market share, still
Platform malicious application is also resulted at the same time to emerge one after another, and is all constantly breaking out new malicious application every year.In face of magnanimity
New Malware sample, being classified to sample and dividing family rapidly and efficiently become a great challenging work
Make.In addition, rapid development and evolution with Android malicious application, malice family classification is continuously increased.Find suitable calculate
Method finds that new malice sample class becomes a urgent problem needed to be solved.
Currently to the research of malice family analysis mainly by with sorting algorithm to the correlated characteristic of Malware into
Row analysis obtains its corresponding family's ownership.But current Malware is constantly evolved, the quantity of family's type is also constantly increasing
Add.This method can only can not find new malice family by software classification into already existing malice family.Based on cluster
Research it is then relatively on the low side, current existing cluster work, which mainly passes through, customizes ROM, then runs the software, passes through acquisition
Its system action is clustered, and the attribution analysis of its family is completed.But this method have the shortcomings that it is obvious.On the one hand, dynamic
Operating cost is bigger, and analysis result cannot be provided in the short time.On the other hand, general Malware can to its malicious act into
Row is hidden, and malicious act triggering difficulty causes to be difficult to collect effective behavioral data when dynamic operation.
In conclusion method proposes a kind of Android malice family clustering methods based on static behavior analysis.Base
In static behavioural analysis analysis efficiency with higher, specific classification results can be provided in a short time.It uses simultaneously
The data can be judged to existing class according to the matching degree size of current data and existing class by the clustering method of SinglePass,
Or one new data category of creation, the increment cluster of stream data is realized, to complete to find the function of new malice family
Energy.
Summary of the invention
Present invention aim to address traditional analysis when facing growing Android malware family,
Analysis efficiency is low and the problem of being difficult to find new family, proposes a kind of based on the adoption of SinglePass algorithm Android malice man
Class method.This method can realize the division to Malware family automatically, and find new Android malware family.
Design principle of the invention are as follows: first to Android malware carry out behavioural characteristic extraction: by decompression and
The methods of decompiling obtains the sensitive permission of the software, system Action, system Category and sensory system API Calls.And
Using these as the behavioural characteristic of the software.Then feature selecting processing is carried out to the feature of building, selects filtering type side here
Method filters out feature of the relatively important feature as next step analysis to feature selecting is carried out.Finally by software for calculation and
The judgment basis that similarity between existing malice family belongs to as family.A similarity threshold is set in decision process first
Value, if similarity be more than threshold value if choose and all existing families in family of the highest family of similarity as the Malware
Race's ownership.The software is divided into new Malware family if lower than threshold value.
The technical scheme is that be achieved by the steps of:
Step 1, to Android malware pretreatment and feature extraction.
Step 1.1, operation, AndroidManifest.xml file and behavior are unziped it to Android application file
Relevant feature.
Step 1.2, decompiling is carried out to the class.dex file after step 1.1 decompression and obtains relevant sensory system tune
Use API.
Step 1.3, numeralization processing is carried out to the characteristic that step 1.1 and step 1.2 obtain.
Step 2, feature selecting and regular processing are carried out to the characteristic that step 1 constructs.
Step 2.1, unessential feature in correlated characteristic is filtered out by filtering type method, and deletes this characteristic
According to.
Step 2.2, regular expression is carried out to the feature after screening, i.e., each single item feature is all used into 0,1 expression, confession is subsequent
Clustering.
Step 3, family's clustering is carried out to Android malware by SinglePass clustering method.
Step 3.1, the similarity between Current software and existing malice family is calculated by the character numerical value of software.
Step 3.2, the similarity of calculating be ranked up in the way of from big to small, obtain maximum similarity
Value.
Step 3.3, if the value is lower than threshold value that this is soft if by the threshold value comparison of maximum similarity value and setting
Part divides new family's type into, otherwise divides into and the maximum Malware family of its similarity.
Beneficial effect
Compared to the Malware family analysis method based on dynamic analysis, present invention employs static behavior Data Datas
Analysis mode has higher generation code coverage rate and lower analysis cost.
Compared to traditional malice family classification algorithm, this method copes with emerging Malware family, in time
It was found that new Malware family.
Detailed description of the invention
Fig. 1 is a kind of Malware family cluster principle figure based on SinglePass algorithm of the present invention.
Fig. 2 is the cluster flow chart based on SinglePass algorithm in specific embodiment.
Specific embodiment
Objects and advantages in order to better illustrate the present invention are done below with reference to embodiment of the example to the method for the present invention
It is further described.
Detailed process are as follows:
Step 1, pretreatment and feature extraction are carried out to Android malware file.
Step 1.1, operation is unziped it to APK file, extracts Android application software
The files such as AndroidManifest.xml file, res file, classs.dex file.Then it extracts
In AndroidManifest.xml file system Action, system Category and sensitive permission these with behavior characterization contain
The feature of justice.
Step 1.2, decompiling operation is carried out to the class.dex file that step 1.1 is extracted.It extracts in file and includes
Sensory system API Calls.The API concrete type of acquisition is as shown in table 1.
The sensitive API type and quantity that table 1 acquires
Step 1.3, feature described in step 1.1 and step 1.2 is subjected to numeralization processing.Feature in step 1.1, such as
Fruit presence then indicates that there is no then indicated with 0 with 1.For API feature in step 1.2, the number of its appearance is chosen as feature.
A feature vector F available for APK file kk:
Fk=[ak0, ak1, ak2……akn] (1)
Wherein aknIndicate that the numerical value of sample k feature, n indicate the number of characteristic item, n is 639 herein.
Step 2, feature selecting and regular processing are carried out to the characteristic that step 1 constructs.
Step 2.1, feature selecting is carried out to the feature vector that step 2 obtains by filtration method.Here selected characteristic variance
As standards of grading.If the value difference of some feature is little, it is generally recognized that this feature is little to the contribution degree for distinguishing sample,
Therefore remove the feature that variance is less than threshold value during construction feature.Such as already existing sample size is k here, then
The variance D of n-th of featurenAs shown in formula (3):
Wherein formula (2) EnRepresent the average value of n-th of feature, ainIndicate n-th of characteristic value of sample i.
Such as DnThen this feature is removed less than threshold value p (choose here 0.1).By finally obtained 333 dimensional feature vector
As subsequent analysis basis.
Step 2.2, data step 2.1 obtained carry out regular processing.Due in clustering using
Jaccard distance is used as classification foundation.So regular processing must be carried out to data, it is regular to the characteristic item greater than 0 here to be
1, regular other values are 0.
Step 3, family's clustering is carried out to Android malware by SinglePass clustering method, specifically
It is as shown in Figure 2 to cluster process.
Step 3.1, the similarity between Current software and existing Malware is calculated by the character numerical value of software.It is similar
The measurement of degree is the key that clustering algorithm place, and common similarity algorithm has Euclidean distance, conscine
Similarity, Jaccard similarity factor, pearson related coefficient, Minkowski Distance and than Chebyshev distance, horse
Family name's distance etc..Since the input data classification type of use is more, two are calculated using Jaccard Index method in this project
Similarity between behavior, by between behavioural characteristic A and behavioural characteristic B in intersection behavioural characteristic number divided by and concentrate row
It is calculated for the number of feature vector.Formula is as follows:
According to Jaccard similarity factor, the Jaccard distance of two set is defined are as follows:
Step 3.2, it to each new Malware, calculates it and has existed software in each family
Then Jaccard distance is averaged the distance as the software and the family.Calculate the software and each Malware
The distance of family and sequence.
Step 3.3, the distance of the software according to obtained in step 3.2 and each family chooses maximum value therein, and sets
Fixed threshold value comparison (by testing threshold value selection 0.9 here repeatedly) divides the software newly into if the value is lower than threshold value
Family's type, otherwise divide into the maximum Malware family of its similarity, and data are added in family's family sequence.In turn
Complete the classification to software of newly arriving.
Test result: the Malware library on http://amd.arguslab.org/behaviors is had chosen as survey
Example on probation amounts to 23511 Malwares, 135 Malware families.By the way that all softwares are carried out a SinglePass
Cluster counts its family and divides correct ratio as evaluation index.Family divides correct software number in experiment test
It is 17210, accuracy rate has reached 83.2%.Description of test, this method have preferable classifying quality, have practical value.
Above-described specific descriptions have carried out further specifically the purpose of invention, technical scheme and beneficial effects
It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention
It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention
Protection scope within.
Claims (5)
1. a kind of Android malice family clustering method based on SinglePass algorithm, it is characterised in that the method includes
Following steps:
Step 1, pretreatment and feature extraction are carried out to Android malware file, by decompressing to existing file
Contracting handles to obtain feature related with software action with decompiling, and characteristic value is handled;
Step 2, in order to improve cluster accuracy, feature selecting operation is carried out to the feature that step 1 constructs, here according to cluster
Demand uses filtration method and is selected, and calculates foundation of the variance of each single item feature as feature selecting, the removal degree of correlation is not
Then high behavioural characteristic carries out each single item feature regular;
Step 3, clustering is carried out to Android malware by SinglePass clustering method, to each newcomer
Malware calculates separately the distance of itself and each Malware family;Then it chooses the highest value of similarity and presets
Threshold value be compared, if lower than the software to be classified as to new a kind of malice family if threshold value, by it if being higher than threshold value
It is classified as and then completing final family's classification process in the highest malice family of its similarity.
2. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special
Sign is: the feature extracted during feature extraction in step 1 includes system Action, system Category, sensitive permission
With the sensory system API from selection.
3. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special
Sign is: carrying out selection to feature in step 2 is that feature selecting, screening are carried out using the filtration method unrelated with subsequent learner
Get rid of unessential feature in primitive character.
4. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special
Sign is: one kind used in step 3 when carrying out family's analysis to Malware and is based on Single-Pass clustering algorithm,
Single-Pass algorithm is based on the cluster that " greediness " (greedy) rule carries out increment type (incremental), this ensure that
While new Malware capable of being divided into reasonable family, new Malware family can be also generated.
5. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special
Sign is: carrying out choosing Jaccard in the SinglePass clustering algorithm used when family's analysis to Malware in step 3
Measure of the similarity factor as each software similitude;This method by software action feature vector A and behavioural characteristic to
Between amount B in intersection behavioural characteristic vector number divided by and concentrate the number of behavioural characteristic vector and calculate, under formula shown in:
According to Jaccard similarity factor, the Jaccard distance of two set is defined are as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810963865.6A CN109145605A (en) | 2018-08-23 | 2018-08-23 | A kind of Android malware family clustering method based on SinglePass algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810963865.6A CN109145605A (en) | 2018-08-23 | 2018-08-23 | A kind of Android malware family clustering method based on SinglePass algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145605A true CN109145605A (en) | 2019-01-04 |
Family
ID=64791238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810963865.6A Pending CN109145605A (en) | 2018-08-23 | 2018-08-23 | A kind of Android malware family clustering method based on SinglePass algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145605A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027069A (en) * | 2019-11-29 | 2020-04-17 | 暨南大学 | Malicious software family detection method, storage medium and computing device |
CN111538839A (en) * | 2020-05-25 | 2020-08-14 | 武汉烽火普天信息技术有限公司 | Real-time text clustering method based on Jacobsard distance |
CN112214770A (en) * | 2020-10-30 | 2021-01-12 | 奇安信科技集团股份有限公司 | Malicious sample identification method and device, computing equipment and medium |
CN112364349A (en) * | 2020-11-30 | 2021-02-12 | 江苏极鼎网络科技有限公司 | Cell-phone APP intellectual detection system equipment |
WO2021027831A1 (en) * | 2019-08-15 | 2021-02-18 | 中兴通讯股份有限公司 | Malicious file detection method and apparatus, electronic device and storage medium |
CN113987502A (en) * | 2021-12-29 | 2022-01-28 | 阿里云计算有限公司 | Object program detection method, device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335655A (en) * | 2015-09-22 | 2016-02-17 | 南京大学 | Android application safety analysis method based on sensitive behavior identification |
CN105512555A (en) * | 2014-12-12 | 2016-04-20 | 哈尔滨安天科技股份有限公司 | Homologous family dividing and mutation method and system based on file string cluster |
CN106845240A (en) * | 2017-03-10 | 2017-06-13 | 西京学院 | A kind of Android malware static detection method based on random forest |
CN106951780A (en) * | 2017-02-08 | 2017-07-14 | 中国科学院信息工程研究所 | Beat again the static detection method and device of bag malicious application |
CN107958154A (en) * | 2016-10-17 | 2018-04-24 | 中国科学院深圳先进技术研究院 | A kind of malware detection device and method |
-
2018
- 2018-08-23 CN CN201810963865.6A patent/CN109145605A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512555A (en) * | 2014-12-12 | 2016-04-20 | 哈尔滨安天科技股份有限公司 | Homologous family dividing and mutation method and system based on file string cluster |
CN105335655A (en) * | 2015-09-22 | 2016-02-17 | 南京大学 | Android application safety analysis method based on sensitive behavior identification |
CN107958154A (en) * | 2016-10-17 | 2018-04-24 | 中国科学院深圳先进技术研究院 | A kind of malware detection device and method |
CN106951780A (en) * | 2017-02-08 | 2017-07-14 | 中国科学院信息工程研究所 | Beat again the static detection method and device of bag malicious application |
CN106845240A (en) * | 2017-03-10 | 2017-06-13 | 西京学院 | A kind of Android malware static detection method based on random forest |
Non-Patent Citations (3)
Title |
---|
潘守慧等: "基于增量聚类的Web上农产品质量安全突发事件追踪模型", 《情报杂志》 * |
王美慧: "一种新的病毒家族聚类系统的设计方法研究", 《科技经济市场》 * |
肖云倡等: "一种基于行为的Android恶意软件家族聚类方法", 《武汉大学学报(理学版)》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021027831A1 (en) * | 2019-08-15 | 2021-02-18 | 中兴通讯股份有限公司 | Malicious file detection method and apparatus, electronic device and storage medium |
CN111027069A (en) * | 2019-11-29 | 2020-04-17 | 暨南大学 | Malicious software family detection method, storage medium and computing device |
CN111027069B (en) * | 2019-11-29 | 2022-04-08 | 暨南大学 | Malicious software family detection method, storage medium and computing device |
CN111538839A (en) * | 2020-05-25 | 2020-08-14 | 武汉烽火普天信息技术有限公司 | Real-time text clustering method based on Jacobsard distance |
CN112214770A (en) * | 2020-10-30 | 2021-01-12 | 奇安信科技集团股份有限公司 | Malicious sample identification method and device, computing equipment and medium |
CN112214770B (en) * | 2020-10-30 | 2023-11-10 | 奇安信科技集团股份有限公司 | Malicious sample identification method, device, computing equipment and medium |
CN112364349A (en) * | 2020-11-30 | 2021-02-12 | 江苏极鼎网络科技有限公司 | Cell-phone APP intellectual detection system equipment |
CN113987502A (en) * | 2021-12-29 | 2022-01-28 | 阿里云计算有限公司 | Object program detection method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145605A (en) | A kind of Android malware family clustering method based on SinglePass algorithm | |
CN106599686B (en) | A kind of Malware clustering method based on TLSH character representation | |
CN111639337B (en) | Unknown malicious code detection method and system for massive Windows software | |
CN103812872B (en) | A kind of network navy behavioral value method and system based on mixing Di Li Cray process | |
CN103617256B (en) | The processing method and processing device of file needing mutation detection | |
CN102724219B (en) | A network data computer processing method and a system thereof | |
CN103106365B (en) | The detection method of the malicious application software on a kind of mobile terminal | |
CN105205397B (en) | Rogue program sample sorting technique and device | |
CN103345528B (en) | A kind of based on association analysis with the file classification method of KNN | |
CN105389480B (en) | Multiclass imbalance genomics data iteration Ensemble feature selection method and system | |
CN104331436A (en) | Rapid classification method of malicious codes based on family genetic codes | |
CN106843941B (en) | Information processing method, device and computer equipment | |
CN106991325B (en) | Protection method and device for software bugs | |
CN110363003B (en) | Android virus static detection method based on deep learning | |
CN107679403A (en) | It is a kind of to extort software mutation detection method based on sequence alignment algorithms | |
CN106803039B (en) | A kind of homologous determination method and device of malicious file | |
US11533373B2 (en) | Global iterative clustering algorithm to model entities' behaviors and detect anomalies | |
CN107392021A (en) | A kind of Android malicious application detection methods based on multiclass feature | |
CN113205134A (en) | Network security situation prediction method and system | |
CN104933365B (en) | A kind of malicious code based on calling custom automates homologous decision method and system | |
CN111753299A (en) | Unbalanced malicious software detection method based on packet integration | |
CN106326746B (en) | A kind of rogue program behavioural characteristic base construction method and device | |
CN106874762A (en) | Android malicious code detecting method based on API dependence graphs | |
CN104331664B (en) | A kind of method that unknown rogue program feature is automatically analyzed under evidence obtaining scene | |
CN112888008A (en) | Base station abnormity detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |
|
RJ01 | Rejection of invention patent application after publication |