CN104376260A - Malicious code visualized analyzing method based on Shannon information entropy - Google Patents

Malicious code visualized analyzing method based on Shannon information entropy Download PDF

Info

Publication number
CN104376260A
CN104376260A CN201410668073.8A CN201410668073A CN104376260A CN 104376260 A CN104376260 A CN 104376260A CN 201410668073 A CN201410668073 A CN 201410668073A CN 104376260 A CN104376260 A CN 104376260A
Authority
CN
China
Prior art keywords
entropy
pixel
value
local
malicious code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410668073.8A
Other languages
Chinese (zh)
Other versions
CN104376260B (en
Inventor
任卓君
孔德凤
刘同洋
乔国娟
冯琪
陈�光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201410668073.8A priority Critical patent/CN104376260B/en
Publication of CN104376260A publication Critical patent/CN104376260A/en
Application granted granted Critical
Publication of CN104376260B publication Critical patent/CN104376260B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a malicious code visualized analyzing method based on the Shannon information entropy. The method includes the steps that firstly, the binary bytes of a malicious file are converted into yellow brightness values of pixel points of a pixel map, and the points with the pixel value being Ox20-Ox7E are marked through a green channel Ox50; secondly, the local entropy of the pixel value in each 256-byte block in the pixel map is calculated based on the pixel value of the pixel map, and the local entropy is calculated according to the following Shannon information entropy formula: Entropy=- *log2pi, wherein the pi represents the occurrence probability the byte (pixel) value i, the value of the i ranges from Ox00 to OxFF, and the Entropy is the local entropy. The f (Entropy) value of the local entropy value Entropy is calculated according to the formula: f (Entropy) = 2<Entropy>-1, and the entropy diagram is generated through the calculation result of the f (Entropy); thirdly, normalization processing is conducted on the calculation result of the f (Entropy), and an entropy normalization diagram is generated. By means of the malicious code visualized analyzing method based on the Shannon information entropy, samples of various groups can be effectively distinguished, when the malicious codes of the same group are analyzed, the potential distinctions can be found much easier, and a basis is provided for mastering the variant evolution law of the group.

Description

A kind of malicious code visual analysis method based on shannon entropy
Technical field
The present invention relates to a kind of malicious code visual analysis method based on shannon entropy.
Background technology
Malware (Malicious Software) is a kind of for destroying computer operating system, stealing the software of sensitive information or unauthorized access privacy system, usually occurs with code, script, dynamic text or other software form.Due to traditional rogue program analytic process often complicated and time consumption, even veteran safety analysis personnel are also difficult to find potential attack mode.For alleviating cognitive load, improving interactivity, Information Visualization Technology is introduced malicious code analysis field, i.e. Malware secure visual, the forward position focus just in network security research in recent years.
2008, propose the thought of gray-scale map (Gray-scale Images) in the Visualized Analysis System (as Fig. 1) that the people such as the Gregory Conti of US Military Academy at West Pint (United States Military Academy West Point) design at it first, identify file fast with the Analysis perspective independent of text and dissect unknown file form.As shown in Figure 1, the d district of this system user interface and the g district ASCII fromat character string of corresponding analyzed file and hexadecimal format order line respectively; Pixel (pixel) value in c district (Byteview) is corresponding with the binary number in g district byte (Byte), presents the internal characteristics of file with the form of gray-scale map; Whether b district (Byte Presence) identifies the row at corresponding row place in self zone according to the existence expanding ASCII character value (0-255) in c district often row, is helped user to grasp file rule by such map operation, is found wherein extremely; F district (Dot Plot) utilizes the similarity between the byte sequence matrix comparison document of file itself, provides basis for estimation when users classification; Other a, e, h district is integrated with subsidiary function that is multiple and user interaction.
But, carry out in sort research in Study document similarity, Gregory Conti, Erik Dean, the method of Matthew Sinda and Benjamin Sangster.Visual Reverse Engineering of Binaryand Data Files [C] .VizSec 2008Symposium on Visualization for CyberSecurity (VizSEC2008) makes calculated amount be directly proportional to file size, and the automaticity of analysis is subject to the restriction of computer hardware performance; Presenting in file internal characteristics, the b district c district pixel value corresponding to byte and reflection ASCII character value being existed situation isolates to come and shows, is unfavorable for the complete understanding to analyzed file characteristic simultaneously.
Summary of the invention
The object of this invention is to provide a kind of method more comprehensively researching and analysing malicious code.
In order to achieve the above object, the invention provides a kind of malicious code visual analysis method based on shannon entropy, it is characterized in that, comprising:
The first step: the yellow scale-of-two byte of malicious file being converted to pixel in " pixel map " is light and shade value, carrys out with green channel 0x50 (display effect may exist nuance because of hardware device difference) point (printable character namely in ASCII character) that marked pixels value is 0x20-0x7E;
Second step: the pixel value based on " pixel map " calculates the local entropy of pixel value in each 256 block of bytes in " pixel map ", and described local entropy is according to following shannon entropy formulae discovery:
Entropy = - &Sigma; i = 0 255 p i &times; lo g 2 p i
Wherein, p irepresent the probability that byte (pixel) value i occurs, the span of i is 0x00-0xFF, Entropy is local entropy;
Calculate f (Entropy) value of local entropy Entropy, its computing formula is:
f(Entropy)=2 Entropy-1;
" entropy diagram " is generated with the result of calculation of f (Entropy);
3rd step: be normalized the result of calculation of f (Entropy), generates " entropy normalization figure ".
Preferably, the described malicious code visual analysis method based on shannon entropy also comprises:
4th step: be normalized " pixel map " in the first step, generates " pixel normalization figure ".
The present invention uses for reference the thought of gray-scale map, in conjunction with the definition of shannon entropy, utilizes K-NearestNeighbor (KNN) sorting algorithm, gives a kind of method for visualizing of research malicious code hierarchical classification newly.Concrete technical scheme is: first binary file to be detected is converted to Huang, green two chrominance channels " pixel map "; On this basis, the local entropy by calculating " pixel map " generates glaucous " entropy diagram "; " entropy diagram ", again through normalized, forms the green dot matrix distribution that light and shade is different, i.e. " entropy normalization figure ".The method utilizes local entropy to calculate and sliding window normalized mechanism, significantly can not only reduce the operand of mass file when similarity analysis, and can improve the effect of visualization of malicious code race classification.
The present invention adopts Python programming realization, all can run under Windows and Linux environment.
Compared with existing secure visual technology, the invention has the beneficial effects as follows:
1, from visual effect, all kinds of malicious code race can effectively be distinguished;
2, when carrying out malicious code analysis of the same clan, comparatively easily finding potential difference, providing foundation for grasping this race's mutation Evolution;
3, " entropy diagram ", " entropy normalization figure ", " pixel normalization figure " three kinds of method for visualizing are combined, more comprehensively can research and analyse entirety and the local feature of malicious code.
4, the present invention is by setting up the Image Communication between people and data, provides more fully information Perception within the unit interval, not only increases the work efficiency of Network Safety Analysis personnel, can also reduce and analyze difficulty and the requirement to analyst's technical merit and experience.
5, the present invention realizes simple and can be used for automation mechanized operation, maps and quick similarity comparison algorithm owing to have employed dimensionality reduction, such that picture rise time expense is little, similarity-rough set efficiency is high
Accompanying drawing explanation
The visualization system schematic diagram that Fig. 1 Gregory Conti designs;
Fig. 2 is the display result exemplary plot that binary file is converted to " pixel map ";
Fig. 3 calculates for " pixel map " exemplary plot being converted to " entropy diagram " through local entropy;
Fig. 4 a is that " entropy diagram " is converted to the exemplary plot of " entropy normalization figure " through normalized;
Fig. 4 b is the mapping relations figure of local entropy and pixel value;
Fig. 5 is converted to the exemplary plot of " pixel normalization figure " for " pixel map " through normalized;
Fig. 6 is " entropy diagram " of Email-Worm.joleee.av sample;
Fig. 7 is " entropy diagram " of Email-Worm.joleee.aw sample;
Fig. 8 is " entropy diagram " of Email-Worm.joleee.ba sample.
Embodiment
For making the present invention become apparent, hereby with a preferred embodiment, and accompanying drawing is coordinated to be described in detail below.(download from VX Heavens official website for 473 harmful sample standard deviations from 59 races of the present invention's test, all sample standard deviations adopt this base nomenclature of kappa)
Embodiment
Based on a malicious code visual analysis method for shannon entropy, be specially:
Step 1: the yellow scale-of-two byte of malicious file (Trojan.Regrun.rk) being converted to pixel in " pixel map " is light and shade value, carrys out with green channel 0x50 (display effect may exist nuance because of hardware device difference) point (printable character namely in ASCII character) that marked pixels value is 0x20-0x7E; As shown in Figure 2, wherein the part of black is background color, and namely scale-of-two byte is 0 value.
Step 2: the pixel value based on " pixel map " calculates the local entropy of pixel value in each 256 block of bytes in " pixel map ", and described local entropy is according to following shannon entropy formulae discovery:
Entropy = - &Sigma; i = 0 255 p i &times; lo g 2 p i
Wherein, p irepresent the probability that byte (pixel) value i occurs, the span of i is 0x00-0xFF, Entropy is local entropy;
Usually, the result of calculation of local entropy Entropy, between [0,8], exports, then because the too low visual effect that causes of brightness is very poor if direct with actual numerical value.Therefore for forming the image mapped mutually with pixel brightness value [0,255], by local entropy Entropy by function f (Entropy)=2 entropyexport after-1 result of calculation, the object of process like this makes the brightness of high entropy show more obvious.Local entropy Entropy is to generate " entropy diagram " after function f (Entropy) result of calculation, and as shown in Figure 3, the present invention adopts " entropy diagram " of the analyzed file of blue-green schemes show.Equally, the part of black is background color, and namely entropy is 0.
Step 3: due to unified judgment criteria should be had when carrying out similarity analysis, and analyzed malicious code file size is different, therefore need to be normalized the above-mentioned result calculated through function f (Entropy), generate " entropy normalization figure ".The normalization algorithm that the present invention proposes adopts the sliding window mechanism that window size is 2 bytes, moving step length is 1 byte, and former and later two bytes under the same window, respectively as the position coordinates (x, y) of " entropy normalization figure " mid point.The number of times that (x, y) combination occurs is directly proportional to the brightness value being somebody's turn to do " entropy normalization figure " mid point, and " entropy normalization figure " is shown as the side figure of 256*256 size, plays up by simple green, distinguishes, as shown in fig. 4 a with the toning scheme of " entropy diagram ".Wherein, the mapping relations of local entropy and pixel value as shown in Figure 4 b.
Step 4: the normalization display operation that based on same reason, present invention also offers " pixel map ", its normalization algorithm is identical with the mode of " entropy diagram " normalized, as shown in Figure 5." pixel normalization figure " plays up by simple yellow, distinguishes with the toning scheme of " pixel map ".
When the unknown is classified, adopt KNN sorting algorithm, the present invention correctly can download 59 classes in sample in district office, and draws in 437 samples only have 28 sample classification ownership mistakes in conjunction with the priori statistics of sample, and namely average correct classification rate is 93.59%.
In the present invention, the rise time of " entropy normalization figure " is directly proportional to the size of sample file, and the average rise time of " entropy normalization figure " is 0.91ms; The similarity-rough set time of " entropy normalization figure " is directly proportional to the number of local entropy block, and the similarity average specific of " entropy normalization figure " comparatively the time is 0.56ms.The time data of gained is the mean value calculated after 100 samplings.It is high that time efficiency is compared in the classification that these result explanations the present invention realizes, and the Python programming structure adopted is reasonable in design.
Embodiment 2
Adopt " entropy diagram " of the malice of the malicious code visual analysis method analysis based on shannon entropy sample Email-Worm.joleee.av, Email-Worm.joleee.aw and Email-Worm.joleee.ba generation described in embodiment 1 as shown in figs 6-8, the present invention is when carrying out malicious code analysis of the same clan, potential difference being found than being easier to, providing foundation for grasping this race's mutation Evolution.

Claims (2)

1., based on a malicious code visual analysis method for shannon entropy, it is characterized in that, comprising:
The first step: the yellow scale-of-two byte of malicious file being converted to pixel in " pixel map " is light and shade value, carrys out with green channel 0x50 the point that marked pixels value is 0x20-0x7E;
Second step: the pixel value based on " pixel map " calculates the local entropy of pixel value in each 256 block of bytes in " pixel map ", and described local entropy is according to following shannon entropy formulae discovery:
Entropy = - &Sigma; i = 0 255 p i &times; log 2 p i
Wherein, p irepresent the probability that byte value i occurs, the span of i is 0x00-0xFF, Entropy is local entropy;
Calculate f (Entropy) value of local entropy Entropy, its computing formula is:
f(Entropy)=2 Entropy-1;
" entropy diagram " is generated with the result of calculation of f (Entropy);
3rd step: be normalized the result of calculation of f (Entropy), generates " entropy normalization figure ".
2., as claimed in claim 1 based on the malicious code visual analysis method of shannon entropy, it is characterized in that, the described malicious code visual analysis method based on shannon entropy also comprises:
4th step: be normalized " pixel map " in the first step, generates " pixel normalization figure ".
CN201410668073.8A 2014-11-20 2014-11-20 A kind of malicious code visual analysis method based on shannon entropy Expired - Fee Related CN104376260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410668073.8A CN104376260B (en) 2014-11-20 2014-11-20 A kind of malicious code visual analysis method based on shannon entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410668073.8A CN104376260B (en) 2014-11-20 2014-11-20 A kind of malicious code visual analysis method based on shannon entropy

Publications (2)

Publication Number Publication Date
CN104376260A true CN104376260A (en) 2015-02-25
CN104376260B CN104376260B (en) 2017-06-30

Family

ID=52555162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410668073.8A Expired - Fee Related CN104376260B (en) 2014-11-20 2014-11-20 A kind of malicious code visual analysis method based on shannon entropy

Country Status (1)

Country Link
CN (1) CN104376260B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399334A (en) * 2018-01-19 2018-08-14 东华大学 A kind of malicious code visual analysis method based on operation code frequency
CN108399335A (en) * 2018-01-30 2018-08-14 东华大学 A kind of malicious code visual analysis method based on local entropy
CN108446558A (en) * 2018-02-08 2018-08-24 东华大学 A kind of malicious code visual analysis method based on space filling curve
CN109726554A (en) * 2017-10-30 2019-05-07 武汉安天信息技术有限责任公司 A kind of detection method of rogue program, device and related application
CN110008363A (en) * 2019-03-19 2019-07-12 中国科学院国家空间科学中心 A kind of visualized data analysis method, system, equipment and the storage medium of Advanced Orbiting Systems
CN110210224A (en) * 2019-05-21 2019-09-06 暨南大学 A kind of mobile software similitude intelligent detecting method of big data based on description entropy
CN111091128A (en) * 2019-12-18 2020-05-01 北京数衍科技有限公司 Character and picture classification method and device and electronic equipment
CN113822839A (en) * 2020-06-18 2021-12-21 飞依诺科技(苏州)有限公司 Medical image processing method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184367A1 (en) * 2007-01-25 2008-07-31 Mandiant, Inc. System and method for determining data entropy to identify malware
CN101706951A (en) * 2009-11-20 2010-05-12 上海电机学院 Method, device and system for objectively evaluating pneumatic optical image quality based on feature fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184367A1 (en) * 2007-01-25 2008-07-31 Mandiant, Inc. System and method for determining data entropy to identify malware
CN101706951A (en) * 2009-11-20 2010-05-12 上海电机学院 Method, device and system for objectively evaluating pneumatic optical image quality based on feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CONTI G ET AL: "《Visual reverse engineering of binary and data files》", 《VISUALIZATION FOR COMPUTER SECURITY》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726554A (en) * 2017-10-30 2019-05-07 武汉安天信息技术有限责任公司 A kind of detection method of rogue program, device and related application
CN108399334B (en) * 2018-01-19 2022-07-05 东华大学 Malicious code visual analysis method based on operation code frequency
CN108399334A (en) * 2018-01-19 2018-08-14 东华大学 A kind of malicious code visual analysis method based on operation code frequency
CN108399335B (en) * 2018-01-30 2022-05-06 东华大学 Malicious code visual analysis method based on local entropy
CN108399335A (en) * 2018-01-30 2018-08-14 东华大学 A kind of malicious code visual analysis method based on local entropy
CN108446558A (en) * 2018-02-08 2018-08-24 东华大学 A kind of malicious code visual analysis method based on space filling curve
CN108446558B (en) * 2018-02-08 2022-05-06 东华大学 Space filling curve-based malicious code visual analysis method
CN110008363A (en) * 2019-03-19 2019-07-12 中国科学院国家空间科学中心 A kind of visualized data analysis method, system, equipment and the storage medium of Advanced Orbiting Systems
CN110008363B (en) * 2019-03-19 2021-10-22 中国科学院国家空间科学中心 Visual data analysis method, system, equipment and storage medium of advanced on-orbit system
CN110210224A (en) * 2019-05-21 2019-09-06 暨南大学 A kind of mobile software similitude intelligent detecting method of big data based on description entropy
CN110210224B (en) * 2019-05-21 2023-01-31 暨南大学 Intelligent big data mobile software similarity detection method based on description entropy
CN111091128A (en) * 2019-12-18 2020-05-01 北京数衍科技有限公司 Character and picture classification method and device and electronic equipment
CN111091128B (en) * 2019-12-18 2023-09-22 北京数衍科技有限公司 Character picture classification method and device and electronic equipment
CN113822839A (en) * 2020-06-18 2021-12-21 飞依诺科技(苏州)有限公司 Medical image processing method and device, computer equipment and storage medium
CN113822839B (en) * 2020-06-18 2024-01-23 飞依诺科技股份有限公司 Medical image processing method, medical image processing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN104376260B (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN104376260A (en) Malicious code visualized analyzing method based on Shannon information entropy
CN110765458B (en) Malicious software image format detection method and device based on deep learning
Han et al. Malware analysis method using visualization of binary files
Al-Afandy et al. High security data hiding using image cropping and LSB least significant bit steganography
Liu et al. A new learning approach to malware classification using discriminative feature extraction
US20200285893A1 (en) Exploit kit detection system based on the neural network using image
Qin et al. Perceptual image hashing with selective sampling for salient structure features
Tang et al. Robust image hash function using local color features
AlQadi et al. Window Averaging Method to Create a Feature Victor for RGB Color Image
CN108280348B (en) Android malicious software identification method based on RGB image mapping
US9478042B1 (en) Determining visibility of rendered content
CN104978565B (en) A kind of pictograph extracting method of universality
CN103886106B (en) Remote sensing image safe-retrieval method based on spectral feature protection
Kumar et al. Near lossless image compression using parallel fractal texture identification
Lai et al. An improved block-based matching algorithm of copy-move forgery detection
CN112261063A (en) Network malicious traffic detection method combined with deep hierarchical network
CN108399335B (en) Malicious code visual analysis method based on local entropy
CN110581856A (en) malicious code detection method and system
Vogt Quantifying landscape fragmentation
Senthamaraikannan et al. Real time color recognition
Wu et al. Robust Camera Model Identification over Online Social Network Shared Images via Multi-Scenario Learning
Li et al. Localisation of insulator strings' images based on colour filtering and texture matching
CN108446558B (en) Space filling curve-based malicious code visual analysis method
Li et al. Perceptual hashing for color images
CN114693955A (en) Method and device for comparing image similarity and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170630

Termination date: 20191120