CN109145605A - A kind of Android malware family clustering method based on SinglePass algorithm - Google Patents

A kind of Android malware family clustering method based on SinglePass algorithm Download PDF

Info

Publication number
CN109145605A
CN109145605A CN201810963865.6A CN201810963865A CN109145605A CN 109145605 A CN109145605 A CN 109145605A CN 201810963865 A CN201810963865 A CN 201810963865A CN 109145605 A CN109145605 A CN 109145605A
Authority
CN
China
Prior art keywords
family
feature
malware
software
singlepass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810963865.6A
Other languages
Chinese (zh)
Inventor
罗森林
张寒青
潘丽敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810963865.6A priority Critical patent/CN109145605A/en
Publication of CN109145605A publication Critical patent/CN109145605A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of Android malice family clustering method based on SinglePass algorithm, belongs to computer and information science technical field.The present invention carries out behavioural characteristic extraction to Android malware first, sensitive permission, system Action, system Category and the sensory system API Calls of the software is obtained, using these as the behavioural characteristic of the software.Then feature selecting processing is carried out with feature of the filtering type method to building, filters out feature of the relatively important feature as next step analysis.The judgment basis belonged to finally by the similarity between software for calculation and existing malice family as family.A similarity threshold is set in decision process first, if similarity be more than threshold value if choose and all existing families in the highest family of similarity as the Malware family belong to.It is on the contrary then the software is divided into new Malware family.The invention can effectively improve analysis efficiency compared to dynamic analysing method, find new software malice family, have biggish practical value.

Description

A kind of Android malware family clustering method based on SinglePass algorithm
Technical field
The present invention relates to a kind of Android malice family clustering method based on SinglePass algorithm, belongs to computer With information science technology field.
Background technique
Today that mobile Internet becomes increasingly popular, mobile terminal become everyone indispensable a part of life. Android is because its good open and flexible ecosystem becomes the maximum mobile terminal device of the market share, still Platform malicious application is also resulted at the same time to emerge one after another, and is all constantly breaking out new malicious application every year.In face of magnanimity New Malware sample, being classified to sample and dividing family rapidly and efficiently become a great challenging work Make.In addition, rapid development and evolution with Android malicious application, malice family classification is continuously increased.Find suitable calculate Method finds that new malice sample class becomes a urgent problem needed to be solved.
Currently to the research of malice family analysis mainly by with sorting algorithm to the correlated characteristic of Malware into Row analysis obtains its corresponding family's ownership.But current Malware is constantly evolved, the quantity of family's type is also constantly increasing Add.This method can only can not find new malice family by software classification into already existing malice family.Based on cluster Research it is then relatively on the low side, current existing cluster work, which mainly passes through, customizes ROM, then runs the software, passes through acquisition Its system action is clustered, and the attribution analysis of its family is completed.But this method have the shortcomings that it is obvious.On the one hand, dynamic Operating cost is bigger, and analysis result cannot be provided in the short time.On the other hand, general Malware can to its malicious act into Row is hidden, and malicious act triggering difficulty causes to be difficult to collect effective behavioral data when dynamic operation.
In conclusion method proposes a kind of Android malice family clustering methods based on static behavior analysis.Base In static behavioural analysis analysis efficiency with higher, specific classification results can be provided in a short time.It uses simultaneously The data can be judged to existing class according to the matching degree size of current data and existing class by the clustering method of SinglePass, Or one new data category of creation, the increment cluster of stream data is realized, to complete to find the function of new malice family Energy.
Summary of the invention
Present invention aim to address traditional analysis when facing growing Android malware family, Analysis efficiency is low and the problem of being difficult to find new family, proposes a kind of based on the adoption of SinglePass algorithm Android malice man Class method.This method can realize the division to Malware family automatically, and find new Android malware family.
Design principle of the invention are as follows: first to Android malware carry out behavioural characteristic extraction: by decompression and The methods of decompiling obtains the sensitive permission of the software, system Action, system Category and sensory system API Calls.And Using these as the behavioural characteristic of the software.Then feature selecting processing is carried out to the feature of building, selects filtering type side here Method filters out feature of the relatively important feature as next step analysis to feature selecting is carried out.Finally by software for calculation and The judgment basis that similarity between existing malice family belongs to as family.A similarity threshold is set in decision process first Value, if similarity be more than threshold value if choose and all existing families in family of the highest family of similarity as the Malware Race's ownership.The software is divided into new Malware family if lower than threshold value.
The technical scheme is that be achieved by the steps of:
Step 1, to Android malware pretreatment and feature extraction.
Step 1.1, operation, AndroidManifest.xml file and behavior are unziped it to Android application file Relevant feature.
Step 1.2, decompiling is carried out to the class.dex file after step 1.1 decompression and obtains relevant sensory system tune Use API.
Step 1.3, numeralization processing is carried out to the characteristic that step 1.1 and step 1.2 obtain.
Step 2, feature selecting and regular processing are carried out to the characteristic that step 1 constructs.
Step 2.1, unessential feature in correlated characteristic is filtered out by filtering type method, and deletes this characteristic According to.
Step 2.2, regular expression is carried out to the feature after screening, i.e., each single item feature is all used into 0,1 expression, confession is subsequent Clustering.
Step 3, family's clustering is carried out to Android malware by SinglePass clustering method.
Step 3.1, the similarity between Current software and existing malice family is calculated by the character numerical value of software.
Step 3.2, the similarity of calculating be ranked up in the way of from big to small, obtain maximum similarity Value.
Step 3.3, if the value is lower than threshold value that this is soft if by the threshold value comparison of maximum similarity value and setting Part divides new family's type into, otherwise divides into and the maximum Malware family of its similarity.
Beneficial effect
Compared to the Malware family analysis method based on dynamic analysis, present invention employs static behavior Data Datas Analysis mode has higher generation code coverage rate and lower analysis cost.
Compared to traditional malice family classification algorithm, this method copes with emerging Malware family, in time It was found that new Malware family.
Detailed description of the invention
Fig. 1 is a kind of Malware family cluster principle figure based on SinglePass algorithm of the present invention.
Fig. 2 is the cluster flow chart based on SinglePass algorithm in specific embodiment.
Specific embodiment
Objects and advantages in order to better illustrate the present invention are done below with reference to embodiment of the example to the method for the present invention It is further described.
Detailed process are as follows:
Step 1, pretreatment and feature extraction are carried out to Android malware file.
Step 1.1, operation is unziped it to APK file, extracts Android application software The files such as AndroidManifest.xml file, res file, classs.dex file.Then it extracts In AndroidManifest.xml file system Action, system Category and sensitive permission these with behavior characterization contain The feature of justice.
Step 1.2, decompiling operation is carried out to the class.dex file that step 1.1 is extracted.It extracts in file and includes Sensory system API Calls.The API concrete type of acquisition is as shown in table 1.
The sensitive API type and quantity that table 1 acquires
Step 1.3, feature described in step 1.1 and step 1.2 is subjected to numeralization processing.Feature in step 1.1, such as Fruit presence then indicates that there is no then indicated with 0 with 1.For API feature in step 1.2, the number of its appearance is chosen as feature. A feature vector F available for APK file kk:
Fk=[ak0, ak1, ak2……akn] (1)
Wherein aknIndicate that the numerical value of sample k feature, n indicate the number of characteristic item, n is 639 herein.
Step 2, feature selecting and regular processing are carried out to the characteristic that step 1 constructs.
Step 2.1, feature selecting is carried out to the feature vector that step 2 obtains by filtration method.Here selected characteristic variance As standards of grading.If the value difference of some feature is little, it is generally recognized that this feature is little to the contribution degree for distinguishing sample, Therefore remove the feature that variance is less than threshold value during construction feature.Such as already existing sample size is k here, then The variance D of n-th of featurenAs shown in formula (3):
Wherein formula (2) EnRepresent the average value of n-th of feature, ainIndicate n-th of characteristic value of sample i.
Such as DnThen this feature is removed less than threshold value p (choose here 0.1).By finally obtained 333 dimensional feature vector As subsequent analysis basis.
Step 2.2, data step 2.1 obtained carry out regular processing.Due in clustering using Jaccard distance is used as classification foundation.So regular processing must be carried out to data, it is regular to the characteristic item greater than 0 here to be 1, regular other values are 0.
Step 3, family's clustering is carried out to Android malware by SinglePass clustering method, specifically It is as shown in Figure 2 to cluster process.
Step 3.1, the similarity between Current software and existing Malware is calculated by the character numerical value of software.It is similar The measurement of degree is the key that clustering algorithm place, and common similarity algorithm has Euclidean distance, conscine Similarity, Jaccard similarity factor, pearson related coefficient, Minkowski Distance and than Chebyshev distance, horse Family name's distance etc..Since the input data classification type of use is more, two are calculated using Jaccard Index method in this project Similarity between behavior, by between behavioural characteristic A and behavioural characteristic B in intersection behavioural characteristic number divided by and concentrate row It is calculated for the number of feature vector.Formula is as follows:
According to Jaccard similarity factor, the Jaccard distance of two set is defined are as follows:
Step 3.2, it to each new Malware, calculates it and has existed software in each family Then Jaccard distance is averaged the distance as the software and the family.Calculate the software and each Malware The distance of family and sequence.
Step 3.3, the distance of the software according to obtained in step 3.2 and each family chooses maximum value therein, and sets Fixed threshold value comparison (by testing threshold value selection 0.9 here repeatedly) divides the software newly into if the value is lower than threshold value Family's type, otherwise divide into the maximum Malware family of its similarity, and data are added in family's family sequence.In turn Complete the classification to software of newly arriving.
Test result: the Malware library on http://amd.arguslab.org/behaviors is had chosen as survey Example on probation amounts to 23511 Malwares, 135 Malware families.By the way that all softwares are carried out a SinglePass Cluster counts its family and divides correct ratio as evaluation index.Family divides correct software number in experiment test It is 17210, accuracy rate has reached 83.2%.Description of test, this method have preferable classifying quality, have practical value.
Above-described specific descriptions have carried out further specifically the purpose of invention, technical scheme and beneficial effects It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention Protection scope within.

Claims (5)

1. a kind of Android malice family clustering method based on SinglePass algorithm, it is characterised in that the method includes Following steps:
Step 1, pretreatment and feature extraction are carried out to Android malware file, by decompressing to existing file Contracting handles to obtain feature related with software action with decompiling, and characteristic value is handled;
Step 2, in order to improve cluster accuracy, feature selecting operation is carried out to the feature that step 1 constructs, here according to cluster Demand uses filtration method and is selected, and calculates foundation of the variance of each single item feature as feature selecting, the removal degree of correlation is not Then high behavioural characteristic carries out each single item feature regular;
Step 3, clustering is carried out to Android malware by SinglePass clustering method, to each newcomer Malware calculates separately the distance of itself and each Malware family;Then it chooses the highest value of similarity and presets Threshold value be compared, if lower than the software to be classified as to new a kind of malice family if threshold value, by it if being higher than threshold value It is classified as and then completing final family's classification process in the highest malice family of its similarity.
2. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special Sign is: the feature extracted during feature extraction in step 1 includes system Action, system Category, sensitive permission With the sensory system API from selection.
3. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special Sign is: carrying out selection to feature in step 2 is that feature selecting, screening are carried out using the filtration method unrelated with subsequent learner Get rid of unessential feature in primitive character.
4. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special Sign is: one kind used in step 3 when carrying out family's analysis to Malware and is based on Single-Pass clustering algorithm, Single-Pass algorithm is based on the cluster that " greediness " (greedy) rule carries out increment type (incremental), this ensure that While new Malware capable of being divided into reasonable family, new Malware family can be also generated.
5. a kind of Android malice family clustering method based on SinglePass algorithm according to claim 1, special Sign is: carrying out choosing Jaccard in the SinglePass clustering algorithm used when family's analysis to Malware in step 3 Measure of the similarity factor as each software similitude;This method by software action feature vector A and behavioural characteristic to Between amount B in intersection behavioural characteristic vector number divided by and concentrate the number of behavioural characteristic vector and calculate, under formula shown in:
According to Jaccard similarity factor, the Jaccard distance of two set is defined are as follows:
CN201810963865.6A 2018-08-23 2018-08-23 A kind of Android malware family clustering method based on SinglePass algorithm Pending CN109145605A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810963865.6A CN109145605A (en) 2018-08-23 2018-08-23 A kind of Android malware family clustering method based on SinglePass algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810963865.6A CN109145605A (en) 2018-08-23 2018-08-23 A kind of Android malware family clustering method based on SinglePass algorithm

Publications (1)

Publication Number Publication Date
CN109145605A true CN109145605A (en) 2019-01-04

Family

ID=64791238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810963865.6A Pending CN109145605A (en) 2018-08-23 2018-08-23 A kind of Android malware family clustering method based on SinglePass algorithm

Country Status (1)

Country Link
CN (1) CN109145605A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027069A (en) * 2019-11-29 2020-04-17 暨南大学 Malicious software family detection method, storage medium and computing device
CN111538839A (en) * 2020-05-25 2020-08-14 武汉烽火普天信息技术有限公司 Real-time text clustering method based on Jacobsard distance
CN112214770A (en) * 2020-10-30 2021-01-12 奇安信科技集团股份有限公司 Malicious sample identification method and device, computing equipment and medium
CN112364349A (en) * 2020-11-30 2021-02-12 江苏极鼎网络科技有限公司 Cell-phone APP intellectual detection system equipment
WO2021027831A1 (en) * 2019-08-15 2021-02-18 中兴通讯股份有限公司 Malicious file detection method and apparatus, electronic device and storage medium
CN113987502A (en) * 2021-12-29 2022-01-28 阿里云计算有限公司 Object program detection method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335655A (en) * 2015-09-22 2016-02-17 南京大学 Android application safety analysis method based on sensitive behavior identification
CN105512555A (en) * 2014-12-12 2016-04-20 哈尔滨安天科技股份有限公司 Homologous family dividing and mutation method and system based on file string cluster
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest
CN106951780A (en) * 2017-02-08 2017-07-14 中国科学院信息工程研究所 Beat again the static detection method and device of bag malicious application
CN107958154A (en) * 2016-10-17 2018-04-24 中国科学院深圳先进技术研究院 A kind of malware detection device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512555A (en) * 2014-12-12 2016-04-20 哈尔滨安天科技股份有限公司 Homologous family dividing and mutation method and system based on file string cluster
CN105335655A (en) * 2015-09-22 2016-02-17 南京大学 Android application safety analysis method based on sensitive behavior identification
CN107958154A (en) * 2016-10-17 2018-04-24 中国科学院深圳先进技术研究院 A kind of malware detection device and method
CN106951780A (en) * 2017-02-08 2017-07-14 中国科学院信息工程研究所 Beat again the static detection method and device of bag malicious application
CN106845240A (en) * 2017-03-10 2017-06-13 西京学院 A kind of Android malware static detection method based on random forest

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
潘守慧等: "基于增量聚类的Web上农产品质量安全突发事件追踪模型", 《情报杂志》 *
王美慧: "一种新的病毒家族聚类系统的设计方法研究", 《科技经济市场》 *
肖云倡等: "一种基于行为的Android恶意软件家族聚类方法", 《武汉大学学报(理学版)》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021027831A1 (en) * 2019-08-15 2021-02-18 中兴通讯股份有限公司 Malicious file detection method and apparatus, electronic device and storage medium
CN111027069A (en) * 2019-11-29 2020-04-17 暨南大学 Malicious software family detection method, storage medium and computing device
CN111027069B (en) * 2019-11-29 2022-04-08 暨南大学 Malicious software family detection method, storage medium and computing device
CN111538839A (en) * 2020-05-25 2020-08-14 武汉烽火普天信息技术有限公司 Real-time text clustering method based on Jacobsard distance
CN112214770A (en) * 2020-10-30 2021-01-12 奇安信科技集团股份有限公司 Malicious sample identification method and device, computing equipment and medium
CN112214770B (en) * 2020-10-30 2023-11-10 奇安信科技集团股份有限公司 Malicious sample identification method, device, computing equipment and medium
CN112364349A (en) * 2020-11-30 2021-02-12 江苏极鼎网络科技有限公司 Cell-phone APP intellectual detection system equipment
CN113987502A (en) * 2021-12-29 2022-01-28 阿里云计算有限公司 Object program detection method, device and storage medium

Similar Documents

Publication Publication Date Title
CN109145605A (en) A kind of Android malware family clustering method based on SinglePass algorithm
CN106599686B (en) A kind of Malware clustering method based on TLSH character representation
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN103812872B (en) A kind of network navy behavioral value method and system based on mixing Di Li Cray process
CN103617256B (en) The processing method and processing device of file needing mutation detection
CN102724219B (en) A network data computer processing method and a system thereof
CN103106365B (en) The detection method of the malicious application software on a kind of mobile terminal
CN105205397B (en) Rogue program sample sorting technique and device
CN103345528B (en) A kind of based on association analysis with the file classification method of KNN
CN105389480B (en) Multiclass imbalance genomics data iteration Ensemble feature selection method and system
CN104331436A (en) Rapid classification method of malicious codes based on family genetic codes
CN106843941B (en) Information processing method, device and computer equipment
CN106991325B (en) Protection method and device for software bugs
CN110363003B (en) Android virus static detection method based on deep learning
CN107679403A (en) It is a kind of to extort software mutation detection method based on sequence alignment algorithms
CN106803039B (en) A kind of homologous determination method and device of malicious file
US11533373B2 (en) Global iterative clustering algorithm to model entities' behaviors and detect anomalies
CN107392021A (en) A kind of Android malicious application detection methods based on multiclass feature
CN113205134A (en) Network security situation prediction method and system
CN104933365B (en) A kind of malicious code based on calling custom automates homologous decision method and system
CN111753299A (en) Unbalanced malicious software detection method based on packet integration
CN106326746B (en) A kind of rogue program behavioural characteristic base construction method and device
CN106874762A (en) Android malicious code detecting method based on API dependence graphs
CN104331664B (en) A kind of method that unknown rogue program feature is automatically analyzed under evidence obtaining scene
CN112888008A (en) Base station abnormity detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication