CN113378163A - Android malicious software family classification method based on DEX file partition characteristics - Google Patents

Android malicious software family classification method based on DEX file partition characteristics Download PDF

Info

Publication number
CN113378163A
CN113378163A CN202010162791.3A CN202010162791A CN113378163A CN 113378163 A CN113378163 A CN 113378163A CN 202010162791 A CN202010162791 A CN 202010162791A CN 113378163 A CN113378163 A CN 113378163A
Authority
CN
China
Prior art keywords
dex file
features
dex
text
classification method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010162791.3A
Other languages
Chinese (zh)
Inventor
张磊
刘亮
高杨晨
岳子巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202310412232.7A priority Critical patent/CN116383819A/en
Priority to CN202010162791.3A priority patent/CN113378163A/en
Publication of CN113378163A publication Critical patent/CN113378163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an Android malicious software family classification method based on DEX file partition characteristics. The method can automatically extract the DEX file of the Android malicious software, then converts the DEX file into the RGB image and the text, and realizes the classification of the Android malicious software through the RGB image characteristics and the text characteristics. Mainly comprises the following steps: (1) extracting a DEX file of the Android malicious software; (2) converting the DEX file into an RGB image; (3) converting the DEX file into a plain text file; (4) extracting texture features of the RGB image; (5) extracting color features of the RGB image; (6) extracting text features of the plain text file; (7) and fusing texture features, color features and text features by utilizing multi-core learning so as to realize the classification of the Android malicious software family.

Description

Android malicious software family classification method based on DEX file partition characteristics
Technical Field
The invention provides an Android malicious software family classification method based on DEX file partition characteristics. The DEX file is visualized and textualized according to the DEX file block characteristics, the DEX file is converted into an RGB image and a plain text respectively, and then the RGB image characteristics and the text characteristics are extracted to serve as Android malicious sample characteristics. And finally, performing family classification on the Android malicious software by using a multi-feature fusion algorithm based on multi-core learning.
Background
Due to the open source characteristic of the Android system, the Android system occupies more than 85% of the market share of the mobile phone market. However, due to rapid iteration of the Android system and serious system fragmentation caused by the open source characteristic, the Android malware generates a large number of variants on the original numerous malicious families, and brings about a small challenge to the classification of the Android malicious families. Traditional static analysis methods are susceptible to confusion and consolidation, while dynamic analysis methods are time and space consuming. The new visualization method does not consider that the Android malware characteristics cause serious feature loss.
Many visualization methods and image processing methods are proposed for dealing with the classification of the malware family, but most methods do not aim at the classification of the Android malware family, and because the Android platform files have characteristics relative to other platform files, many methods are not suitable for the classification of the Android malware family and cause the loss of characteristics of the Android malware. In addition, many methods for the Android platform have low classification accuracy due to the defects in the visualization method and the image processing method. In order to solve the problems, the invention provides a more accurate Android malicious software family classification method. The method fully analyzes and utilizes the characteristics of the DEX file of the Android executable file, converts the DEX file into an RGB image and a text by means of the block characteristics of the DEX file, and then extracts the image characteristics and the text characteristics respectively to classify the Android malicious software. Compared with a dynamic and static analysis method, the method has higher analysis efficiency and interference resistance. Compared with a gray-scale image, the RGB image has color characteristics except texture characteristics, and can represent Android application software in a more multidimensional manner. In addition, text features are added besides the image features, and the image features and the text features are combined to enable the Android malicious family classification to be more accurate on the basis of not influencing the classification efficiency.
Disclosure of Invention
According to the method, the DEX file of the code execution file is extracted by decompressing the Android installation package file, and then the header file of the DEX file is analyzed to obtain blocks with different functions. And visualizing and textualizing the DEX file by using the characteristics of each block and among the blocks, and converting the DEX file into a more intuitive RGB image and text. Then, image features and text features are extracted as features of the DEX file. The method has the advantages that the DEX file byte codes are directly operated, the influence of confusion and reinforcement on analysis is reduced, the image characteristics and the text characteristics are combined to classify the malicious Android families, and the classification efficiency is improved on the premise of ensuring the accuracy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a frame diagram of an Android malware family classification method based on DEX file block characteristics provided by the invention.
FIG. 2 is a flow chart of a DEX file visualization process provided by the present invention.
FIG. 3 is a frame diagram of a feature fusion algorithm based on multi-core learning provided by the present invention.
Detailed Description
In order to make the implementation purpose, technical scheme and advantages of the invention more clear, the invention will be briefly described in the following with reference to the accompanying drawings of the specification of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The method for classifying the Android malicious family based on the DEX file characteristics mainly comprises four main steps: DEX file extraction, DEX file processing, feature extraction and learning classification. Extracting the DEX file, namely decompressing the Android installation package APK file, and extracting the DEX file with the suffix of DEX from a decompressed folder; the DEX file processing comprises visualization and textualization of the DEX file, after the DEX file is obtained, block division is carried out on the DEX file by analyzing a header file area of the DEX file, and the DEX file is converted into an RGB image and a plain text file by utilizing the divided block characteristics; the characteristic extraction comprises extracting image characteristics of RGB images and text characteristics of plain texts; and the learning classification is to integrate the obtained image features and text features by using a feature fusion algorithm of multi-kernel learning and then classify. The specific implementation includes seven small steps as follows.
Step one, extracting a DEX file: decompressing the Android installation package APK file, extracting a DEX file with a suffix of DEX from a decompressed folder, and then analyzing a header file of the DEX file to obtain 8 sections of the DEX file.
Step two, visualizing the DEX file: selecting a DEX file byte code as one channel, wherein the length and the content of a block are different due to different character strings, variables, methods and classes, so that the ratio and the entropy value of the block are selected as other two channels, and the three channels are combined to obtain a three-dimensional vector with the length corresponding to the size of the DEX file; in order to introduce file size characteristics, the matrix width is judged by the file size in the process of vector matrixing; and finally converting the matrix into an RGB image.
Step three, the DEX file is subjected to text conversion: the data section stores all character strings related to the DEX file, wherein the character strings comprise variable names, class names, method names and the like; the DEX file adopts an LEB128 coding mode, and each LEB128 coding value consists of 1-5 bytes and represents a value together; each byte has its most significant bit set (except for the last byte in the sequence, which has its most significant bit cleared); the remaining 7 bits of each byte are payload; the 7-bit valid bit corresponds exactly to the ASSIC code table. Text information can be generated by extracting the last seven bits of each byte; in the process of generating text information, due to the existence of the separation symbols in the DEX file format, a large number of irrelevant symbols are generated, and the subsequent text feature extraction is influenced; therefore, a text filter is generated according to the encoding range of the ASSIC code table, irrelevant symbols are filtered out through the text filter, and only pure text information relevant to the character string is left.
Step four, extracting texture features of the RGB image: extracting texture features by using a GIST algorithm; texture features are global features that describe the surface properties of the scene to which an image or image region corresponds. As a statistical feature, the texture feature often has rotation invariance and is resistant to noise.
Step five, extracting the color characteristics of the RGB image: color moments are used to extract color features of the RGB image, which describe surface properties of the image or image regions, unaffected by image rotation and translation changes.
Step six, extracting text features: the method comprises the steps of segmenting words of a text, calculating weight, sequencing all keywords according to the weight, selecting a certain number of keywords, and calculating the hash value of each keyword by using an md5 hash algorithm. By using the algorithm flow of the Simhash algorithm for reference, the weight of the keyword is given to the hash value, and then the positive and negative are set according to each digit. And finally accumulating the arrays of all the keywords to obtain the text characteristics.
And seventhly, classifying by using a multi-core learning feature fusion algorithm: selecting an optimal kernel function of texture features, color features and text features, then performing linear combination on the kernel functions, updating the weight of the kernel functions through continuous iteration, determining a weight construction kernel matrix under the optimal condition to realize the construction of a classifier, and finally classifying by using the classifier.

Claims (8)

1. A DEX file partition characteristic-based Android malicious software family classification method is characterized by comprising the following steps:
the method comprises the following steps: DEX file for extracting Android malicious software
Step two: converting DEX files into RGB images
Step three: converting DEX files into plain text files
Step four: extracting texture features of RGB images
Step five: extracting color features of RGB images
Step six: extracting text features of plain text files
Step seven: method for classifying Android malware families by fusing texture features, color features and text features through multi-core learning
The method for classifying the Android malicious software family based on the DEX file partition characteristics is characterized in that the Android malicious software is converted into an RGB image and a text file by utilizing the idea of subject blending, so that the requirements in the field are met by utilizing the advanced technology in the related field.
2. The DEX file partition characteristic-based Android malware family classification method as claimed in claim 1, wherein in the second step and the third step, when the DEX file is visualized and textualized, since the DEX file is directly operated, API of application software is not required to be extracted, and influence of reinforcement and confusion on analysis is reduced.
3. The DEX file partition characteristic-based Android malware family classification method according to claim 1 is characterized in that when the DEX file is visualized in the step two, the characteristic of a DEX section is fully utilized to convert the DEX file into an RGB image, and compared with the conventional gray level image, the method has more characteristics and is helpful for improving classification accuracy.
4. The DEX file partition characteristic-based Android malware family classification method of claim 1, characterized in that the DEX file is converted into a plain text file in step three.
5. The DEX file partition characteristic-based Android malware family classification method as claimed in claim 1, wherein when the texture features of the RGB images are extracted in the fourth step, the texture features are global features, describe surface properties of scenes corresponding to the images or image regions, and as statistical features, the texture features are always rotation invariant and have strong resistance to noise.
6. The DEX file partition characteristic-based Android malware family classification method as claimed in claim 1, wherein when RGB image color features are extracted in step five, the color features describe surface properties of images or image regions, and are not affected by image rotation and translation changes.
7. The DEX file partition characteristic-based Android malicious software family classification method according to claim 1, wherein when text features are extracted in the sixth step, the improved Simhash algorithm has the characteristic of keeping data similarity and can be used for massive text similarity calculation.
8. The DEX file partition characteristic-based Android malware family classification method as claimed in claim 1, wherein in the seventh classification step, the classification accuracy can be improved by using a multi-core learning classification method.
CN202010162791.3A 2020-03-10 2020-03-10 Android malicious software family classification method based on DEX file partition characteristics Pending CN113378163A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310412232.7A CN116383819A (en) 2020-03-10 2020-03-10 Android malicious software family classification method
CN202010162791.3A CN113378163A (en) 2020-03-10 2020-03-10 Android malicious software family classification method based on DEX file partition characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010162791.3A CN113378163A (en) 2020-03-10 2020-03-10 Android malicious software family classification method based on DEX file partition characteristics

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202310412232.7A Division CN116383819A (en) 2020-03-10 2020-03-10 Android malicious software family classification method

Publications (1)

Publication Number Publication Date
CN113378163A true CN113378163A (en) 2021-09-10

Family

ID=77568841

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310412232.7A Pending CN116383819A (en) 2020-03-10 2020-03-10 Android malicious software family classification method
CN202010162791.3A Pending CN113378163A (en) 2020-03-10 2020-03-10 Android malicious software family classification method based on DEX file partition characteristics

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202310412232.7A Pending CN116383819A (en) 2020-03-10 2020-03-10 Android malicious software family classification method

Country Status (1)

Country Link
CN (2) CN116383819A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114329472A (en) * 2021-12-31 2022-04-12 淮阴工学院 BIOS (basic input output System) malicious program detection method and device based on double embedding and model pruning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015101096A1 (en) * 2013-12-30 2015-07-09 北京奇虎科技有限公司 Method and device for detecting malicious code in smart terminal
CN106096411A (en) * 2016-06-08 2016-11-09 浙江工业大学 A kind of Android malicious code family classification method based on bytecode image clustering
CN107103235A (en) * 2017-02-27 2017-08-29 广东工业大学 A kind of Android malware detection method based on convolutional neural networks
CN108280348A (en) * 2018-01-09 2018-07-13 上海大学 Android Malware recognition methods based on RGB image mapping
CN108280350A (en) * 2018-02-05 2018-07-13 南京航空航天大学 A kind of mobile network's terminal Malware multiple features detection method towards Android
CN108710608A (en) * 2018-04-28 2018-10-26 四川大学 A kind of malice domain name language material library generating method based on context semanteme
CN109190371A (en) * 2018-07-09 2019-01-11 四川大学 A kind of the Android malware detection method and technology of Behavior-based control figure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015101096A1 (en) * 2013-12-30 2015-07-09 北京奇虎科技有限公司 Method and device for detecting malicious code in smart terminal
CN106096411A (en) * 2016-06-08 2016-11-09 浙江工业大学 A kind of Android malicious code family classification method based on bytecode image clustering
CN107103235A (en) * 2017-02-27 2017-08-29 广东工业大学 A kind of Android malware detection method based on convolutional neural networks
CN108280348A (en) * 2018-01-09 2018-07-13 上海大学 Android Malware recognition methods based on RGB image mapping
CN108280350A (en) * 2018-02-05 2018-07-13 南京航空航天大学 A kind of mobile network's terminal Malware multiple features detection method towards Android
CN108710608A (en) * 2018-04-28 2018-10-26 四川大学 A kind of malice domain name language material library generating method based on context semanteme
CN109190371A (en) * 2018-07-09 2019-01-11 四川大学 A kind of the Android malware detection method and technology of Behavior-based control figure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONG FANG 等: "Android Malware Familial Classification Based on DEX File Section Features", 《IEEE ACCESS》, vol. 8, 10 January 2020 (2020-01-10), pages 10614 - 10627, XP011767708, DOI: 10.1109/ACCESS.2020.2965646 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114329472A (en) * 2021-12-31 2022-04-12 淮阴工学院 BIOS (basic input output System) malicious program detection method and device based on double embedding and model pruning

Also Published As

Publication number Publication date
CN116383819A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN101710334B (en) Large-scale image library retrieving method based on image Hash
CN104661037B (en) The detection method and system that compression image quantization table is distorted
CN109241741B (en) Malicious code classification method based on image texture fingerprints
Roussev et al. File fragment encoding classification—An empirical approach
CN113221115B (en) Visual malicious software detection method based on collaborative learning
CN115511890B (en) Analysis system for large-flow data of special-shaped network interface
JP6235414B2 (en) Feature quantity computing device, feature quantity computing method, and feature quantity computing program
CN101794378B (en) Rubbish image filtering method based on image encoding
CN111639185B (en) Relation information extraction method, device, electronic equipment and readable storage medium
CN113962199B (en) Text recognition method, text recognition device, text recognition equipment, storage medium and program product
US20220215679A1 (en) Method of determining a density of cells in a cell image, electronic device, and storage medium
CN113378163A (en) Android malicious software family classification method based on DEX file partition characteristics
JP2006351001A (en) Content characteristic quantity extraction method and device, and content identity determination method and device
CN116975864A (en) Malicious code detection method and device, electronic equipment and storage medium
KR102185831B1 (en) Dhash-based Malicious Code Analysis Apparatus and method thereof
CN111552965A (en) Malicious software classification method based on PE (provider edge) header visualization
CN116595525A (en) Threshold mechanism malicious software detection method and system based on software map
CN107491423B (en) Chinese document gene quantization and characterization method based on numerical value-character string mixed coding
CN115292702A (en) Malicious code family identification method, device, equipment and storage medium
Ting et al. Faster classification using compression analytics
CN108805132B (en) Rubbish text filtering method based on deep learning
CN110826063A (en) Malicious code detection method based on API fragment
Nguyen et al. Decision tree algorithms for image data type identification
Singh et al. Bytefreq: Malware clustering using byte frequency
KR20190111643A (en) Data processing method for decoding text data and data processing apparatus thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210910

RJ01 Rejection of invention patent application after publication