CN104376260A - Malicious code visualized analyzing method based on Shannon information entropy - Google Patents
Malicious code visualized analyzing method based on Shannon information entropy Download PDFInfo
- Publication number
- CN104376260A CN104376260A CN201410668073.8A CN201410668073A CN104376260A CN 104376260 A CN104376260 A CN 104376260A CN 201410668073 A CN201410668073 A CN 201410668073A CN 104376260 A CN104376260 A CN 104376260A
- Authority
- CN
- China
- Prior art keywords
- entropy
- pixel
- value
- local
- malicious code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/561—Virus type analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Virology (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a malicious code visualized analyzing method based on the Shannon information entropy. The method includes the steps that firstly, the binary bytes of a malicious file are converted into yellow brightness values of pixel points of a pixel map, and the points with the pixel value being Ox20-Ox7E are marked through a green channel Ox50; secondly, the local entropy of the pixel value in each 256-byte block in the pixel map is calculated based on the pixel value of the pixel map, and the local entropy is calculated according to the following Shannon information entropy formula: Entropy=- *log2pi, wherein the pi represents the occurrence probability the byte (pixel) value i, the value of the i ranges from Ox00 to OxFF, and the Entropy is the local entropy. The f (Entropy) value of the local entropy value Entropy is calculated according to the formula: f (Entropy) = 2<Entropy>-1, and the entropy diagram is generated through the calculation result of the f (Entropy); thirdly, normalization processing is conducted on the calculation result of the f (Entropy), and an entropy normalization diagram is generated. By means of the malicious code visualized analyzing method based on the Shannon information entropy, samples of various groups can be effectively distinguished, when the malicious codes of the same group are analyzed, the potential distinctions can be found much easier, and a basis is provided for mastering the variant evolution law of the group.
Description
Technical field
The present invention relates to a kind of malicious code visual analysis method based on shannon entropy.
Background technology
Malware (Malicious Software) is a kind of for destroying computer operating system, stealing the software of sensitive information or unauthorized access privacy system, usually occurs with code, script, dynamic text or other software form.Due to traditional rogue program analytic process often complicated and time consumption, even veteran safety analysis personnel are also difficult to find potential attack mode.For alleviating cognitive load, improving interactivity, Information Visualization Technology is introduced malicious code analysis field, i.e. Malware secure visual, the forward position focus just in network security research in recent years.
2008, propose the thought of gray-scale map (Gray-scale Images) in the Visualized Analysis System (as Fig. 1) that the people such as the Gregory Conti of US Military Academy at West Pint (United States Military Academy West Point) design at it first, identify file fast with the Analysis perspective independent of text and dissect unknown file form.As shown in Figure 1, the d district of this system user interface and the g district ASCII fromat character string of corresponding analyzed file and hexadecimal format order line respectively; Pixel (pixel) value in c district (Byteview) is corresponding with the binary number in g district byte (Byte), presents the internal characteristics of file with the form of gray-scale map; Whether b district (Byte Presence) identifies the row at corresponding row place in self zone according to the existence expanding ASCII character value (0-255) in c district often row, is helped user to grasp file rule by such map operation, is found wherein extremely; F district (Dot Plot) utilizes the similarity between the byte sequence matrix comparison document of file itself, provides basis for estimation when users classification; Other a, e, h district is integrated with subsidiary function that is multiple and user interaction.
But, carry out in sort research in Study document similarity, Gregory Conti, Erik Dean, the method of Matthew Sinda and Benjamin Sangster.Visual Reverse Engineering of Binaryand Data Files [C] .VizSec 2008Symposium on Visualization for CyberSecurity (VizSEC2008) makes calculated amount be directly proportional to file size, and the automaticity of analysis is subject to the restriction of computer hardware performance; Presenting in file internal characteristics, the b district c district pixel value corresponding to byte and reflection ASCII character value being existed situation isolates to come and shows, is unfavorable for the complete understanding to analyzed file characteristic simultaneously.
Summary of the invention
The object of this invention is to provide a kind of method more comprehensively researching and analysing malicious code.
In order to achieve the above object, the invention provides a kind of malicious code visual analysis method based on shannon entropy, it is characterized in that, comprising:
The first step: the yellow scale-of-two byte of malicious file being converted to pixel in " pixel map " is light and shade value, carrys out with green channel 0x50 (display effect may exist nuance because of hardware device difference) point (printable character namely in ASCII character) that marked pixels value is 0x20-0x7E;
Second step: the pixel value based on " pixel map " calculates the local entropy of pixel value in each 256 block of bytes in " pixel map ", and described local entropy is according to following shannon entropy formulae discovery:
Wherein, p
irepresent the probability that byte (pixel) value i occurs, the span of i is 0x00-0xFF, Entropy is local entropy;
Calculate f (Entropy) value of local entropy Entropy, its computing formula is:
f(Entropy)=2
Entropy-1;
" entropy diagram " is generated with the result of calculation of f (Entropy);
3rd step: be normalized the result of calculation of f (Entropy), generates " entropy normalization figure ".
Preferably, the described malicious code visual analysis method based on shannon entropy also comprises:
4th step: be normalized " pixel map " in the first step, generates " pixel normalization figure ".
The present invention uses for reference the thought of gray-scale map, in conjunction with the definition of shannon entropy, utilizes K-NearestNeighbor (KNN) sorting algorithm, gives a kind of method for visualizing of research malicious code hierarchical classification newly.Concrete technical scheme is: first binary file to be detected is converted to Huang, green two chrominance channels " pixel map "; On this basis, the local entropy by calculating " pixel map " generates glaucous " entropy diagram "; " entropy diagram ", again through normalized, forms the green dot matrix distribution that light and shade is different, i.e. " entropy normalization figure ".The method utilizes local entropy to calculate and sliding window normalized mechanism, significantly can not only reduce the operand of mass file when similarity analysis, and can improve the effect of visualization of malicious code race classification.
The present invention adopts Python programming realization, all can run under Windows and Linux environment.
Compared with existing secure visual technology, the invention has the beneficial effects as follows:
1, from visual effect, all kinds of malicious code race can effectively be distinguished;
2, when carrying out malicious code analysis of the same clan, comparatively easily finding potential difference, providing foundation for grasping this race's mutation Evolution;
3, " entropy diagram ", " entropy normalization figure ", " pixel normalization figure " three kinds of method for visualizing are combined, more comprehensively can research and analyse entirety and the local feature of malicious code.
4, the present invention is by setting up the Image Communication between people and data, provides more fully information Perception within the unit interval, not only increases the work efficiency of Network Safety Analysis personnel, can also reduce and analyze difficulty and the requirement to analyst's technical merit and experience.
5, the present invention realizes simple and can be used for automation mechanized operation, maps and quick similarity comparison algorithm owing to have employed dimensionality reduction, such that picture rise time expense is little, similarity-rough set efficiency is high
Accompanying drawing explanation
The visualization system schematic diagram that Fig. 1 Gregory Conti designs;
Fig. 2 is the display result exemplary plot that binary file is converted to " pixel map ";
Fig. 3 calculates for " pixel map " exemplary plot being converted to " entropy diagram " through local entropy;
Fig. 4 a is that " entropy diagram " is converted to the exemplary plot of " entropy normalization figure " through normalized;
Fig. 4 b is the mapping relations figure of local entropy and pixel value;
Fig. 5 is converted to the exemplary plot of " pixel normalization figure " for " pixel map " through normalized;
Fig. 6 is " entropy diagram " of Email-Worm.joleee.av sample;
Fig. 7 is " entropy diagram " of Email-Worm.joleee.aw sample;
Fig. 8 is " entropy diagram " of Email-Worm.joleee.ba sample.
Embodiment
For making the present invention become apparent, hereby with a preferred embodiment, and accompanying drawing is coordinated to be described in detail below.(download from VX Heavens official website for 473 harmful sample standard deviations from 59 races of the present invention's test, all sample standard deviations adopt this base nomenclature of kappa)
Embodiment
Based on a malicious code visual analysis method for shannon entropy, be specially:
Step 1: the yellow scale-of-two byte of malicious file (Trojan.Regrun.rk) being converted to pixel in " pixel map " is light and shade value, carrys out with green channel 0x50 (display effect may exist nuance because of hardware device difference) point (printable character namely in ASCII character) that marked pixels value is 0x20-0x7E; As shown in Figure 2, wherein the part of black is background color, and namely scale-of-two byte is 0 value.
Step 2: the pixel value based on " pixel map " calculates the local entropy of pixel value in each 256 block of bytes in " pixel map ", and described local entropy is according to following shannon entropy formulae discovery:
Wherein, p
irepresent the probability that byte (pixel) value i occurs, the span of i is 0x00-0xFF, Entropy is local entropy;
Usually, the result of calculation of local entropy Entropy, between [0,8], exports, then because the too low visual effect that causes of brightness is very poor if direct with actual numerical value.Therefore for forming the image mapped mutually with pixel brightness value [0,255], by local entropy Entropy by function f (Entropy)=2
entropyexport after-1 result of calculation, the object of process like this makes the brightness of high entropy show more obvious.Local entropy Entropy is to generate " entropy diagram " after function f (Entropy) result of calculation, and as shown in Figure 3, the present invention adopts " entropy diagram " of the analyzed file of blue-green schemes show.Equally, the part of black is background color, and namely entropy is 0.
Step 3: due to unified judgment criteria should be had when carrying out similarity analysis, and analyzed malicious code file size is different, therefore need to be normalized the above-mentioned result calculated through function f (Entropy), generate " entropy normalization figure ".The normalization algorithm that the present invention proposes adopts the sliding window mechanism that window size is 2 bytes, moving step length is 1 byte, and former and later two bytes under the same window, respectively as the position coordinates (x, y) of " entropy normalization figure " mid point.The number of times that (x, y) combination occurs is directly proportional to the brightness value being somebody's turn to do " entropy normalization figure " mid point, and " entropy normalization figure " is shown as the side figure of 256*256 size, plays up by simple green, distinguishes, as shown in fig. 4 a with the toning scheme of " entropy diagram ".Wherein, the mapping relations of local entropy and pixel value as shown in Figure 4 b.
Step 4: the normalization display operation that based on same reason, present invention also offers " pixel map ", its normalization algorithm is identical with the mode of " entropy diagram " normalized, as shown in Figure 5." pixel normalization figure " plays up by simple yellow, distinguishes with the toning scheme of " pixel map ".
When the unknown is classified, adopt KNN sorting algorithm, the present invention correctly can download 59 classes in sample in district office, and draws in 437 samples only have 28 sample classification ownership mistakes in conjunction with the priori statistics of sample, and namely average correct classification rate is 93.59%.
In the present invention, the rise time of " entropy normalization figure " is directly proportional to the size of sample file, and the average rise time of " entropy normalization figure " is 0.91ms; The similarity-rough set time of " entropy normalization figure " is directly proportional to the number of local entropy block, and the similarity average specific of " entropy normalization figure " comparatively the time is 0.56ms.The time data of gained is the mean value calculated after 100 samplings.It is high that time efficiency is compared in the classification that these result explanations the present invention realizes, and the Python programming structure adopted is reasonable in design.
Embodiment 2
Adopt " entropy diagram " of the malice of the malicious code visual analysis method analysis based on shannon entropy sample Email-Worm.joleee.av, Email-Worm.joleee.aw and Email-Worm.joleee.ba generation described in embodiment 1 as shown in figs 6-8, the present invention is when carrying out malicious code analysis of the same clan, potential difference being found than being easier to, providing foundation for grasping this race's mutation Evolution.
Claims (2)
1., based on a malicious code visual analysis method for shannon entropy, it is characterized in that, comprising:
The first step: the yellow scale-of-two byte of malicious file being converted to pixel in " pixel map " is light and shade value, carrys out with green channel 0x50 the point that marked pixels value is 0x20-0x7E;
Second step: the pixel value based on " pixel map " calculates the local entropy of pixel value in each 256 block of bytes in " pixel map ", and described local entropy is according to following shannon entropy formulae discovery:
Wherein, p
irepresent the probability that byte value i occurs, the span of i is 0x00-0xFF, Entropy is local entropy;
Calculate f (Entropy) value of local entropy Entropy, its computing formula is:
f(Entropy)=2
Entropy-1;
" entropy diagram " is generated with the result of calculation of f (Entropy);
3rd step: be normalized the result of calculation of f (Entropy), generates " entropy normalization figure ".
2., as claimed in claim 1 based on the malicious code visual analysis method of shannon entropy, it is characterized in that, the described malicious code visual analysis method based on shannon entropy also comprises:
4th step: be normalized " pixel map " in the first step, generates " pixel normalization figure ".
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410668073.8A CN104376260B (en) | 2014-11-20 | 2014-11-20 | A kind of malicious code visual analysis method based on shannon entropy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410668073.8A CN104376260B (en) | 2014-11-20 | 2014-11-20 | A kind of malicious code visual analysis method based on shannon entropy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104376260A true CN104376260A (en) | 2015-02-25 |
CN104376260B CN104376260B (en) | 2017-06-30 |
Family
ID=52555162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410668073.8A Expired - Fee Related CN104376260B (en) | 2014-11-20 | 2014-11-20 | A kind of malicious code visual analysis method based on shannon entropy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104376260B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399334A (en) * | 2018-01-19 | 2018-08-14 | 东华大学 | A kind of malicious code visual analysis method based on operation code frequency |
CN108399335A (en) * | 2018-01-30 | 2018-08-14 | 东华大学 | A kind of malicious code visual analysis method based on local entropy |
CN108446558A (en) * | 2018-02-08 | 2018-08-24 | 东华大学 | A kind of malicious code visual analysis method based on space filling curve |
CN109726554A (en) * | 2017-10-30 | 2019-05-07 | 武汉安天信息技术有限责任公司 | A kind of detection method of rogue program, device and related application |
CN110008363A (en) * | 2019-03-19 | 2019-07-12 | 中国科学院国家空间科学中心 | A kind of visualized data analysis method, system, equipment and the storage medium of Advanced Orbiting Systems |
CN110210224A (en) * | 2019-05-21 | 2019-09-06 | 暨南大学 | A kind of mobile software similitude intelligent detecting method of big data based on description entropy |
CN111091128A (en) * | 2019-12-18 | 2020-05-01 | 北京数衍科技有限公司 | Character and picture classification method and device and electronic equipment |
CN113822839A (en) * | 2020-06-18 | 2021-12-21 | 飞依诺科技(苏州)有限公司 | Medical image processing method and device, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080184367A1 (en) * | 2007-01-25 | 2008-07-31 | Mandiant, Inc. | System and method for determining data entropy to identify malware |
CN101706951A (en) * | 2009-11-20 | 2010-05-12 | 上海电机学院 | Method, device and system for objectively evaluating pneumatic optical image quality based on feature fusion |
-
2014
- 2014-11-20 CN CN201410668073.8A patent/CN104376260B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080184367A1 (en) * | 2007-01-25 | 2008-07-31 | Mandiant, Inc. | System and method for determining data entropy to identify malware |
CN101706951A (en) * | 2009-11-20 | 2010-05-12 | 上海电机学院 | Method, device and system for objectively evaluating pneumatic optical image quality based on feature fusion |
Non-Patent Citations (1)
Title |
---|
CONTI G ET AL: "《Visual reverse engineering of binary and data files》", 《VISUALIZATION FOR COMPUTER SECURITY》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726554A (en) * | 2017-10-30 | 2019-05-07 | 武汉安天信息技术有限责任公司 | A kind of detection method of rogue program, device and related application |
CN108399334B (en) * | 2018-01-19 | 2022-07-05 | 东华大学 | Malicious code visual analysis method based on operation code frequency |
CN108399334A (en) * | 2018-01-19 | 2018-08-14 | 东华大学 | A kind of malicious code visual analysis method based on operation code frequency |
CN108399335B (en) * | 2018-01-30 | 2022-05-06 | 东华大学 | Malicious code visual analysis method based on local entropy |
CN108399335A (en) * | 2018-01-30 | 2018-08-14 | 东华大学 | A kind of malicious code visual analysis method based on local entropy |
CN108446558A (en) * | 2018-02-08 | 2018-08-24 | 东华大学 | A kind of malicious code visual analysis method based on space filling curve |
CN108446558B (en) * | 2018-02-08 | 2022-05-06 | 东华大学 | Space filling curve-based malicious code visual analysis method |
CN110008363A (en) * | 2019-03-19 | 2019-07-12 | 中国科学院国家空间科学中心 | A kind of visualized data analysis method, system, equipment and the storage medium of Advanced Orbiting Systems |
CN110008363B (en) * | 2019-03-19 | 2021-10-22 | 中国科学院国家空间科学中心 | Visual data analysis method, system, equipment and storage medium of advanced on-orbit system |
CN110210224A (en) * | 2019-05-21 | 2019-09-06 | 暨南大学 | A kind of mobile software similitude intelligent detecting method of big data based on description entropy |
CN110210224B (en) * | 2019-05-21 | 2023-01-31 | 暨南大学 | Intelligent big data mobile software similarity detection method based on description entropy |
CN111091128A (en) * | 2019-12-18 | 2020-05-01 | 北京数衍科技有限公司 | Character and picture classification method and device and electronic equipment |
CN111091128B (en) * | 2019-12-18 | 2023-09-22 | 北京数衍科技有限公司 | Character picture classification method and device and electronic equipment |
CN113822839A (en) * | 2020-06-18 | 2021-12-21 | 飞依诺科技(苏州)有限公司 | Medical image processing method and device, computer equipment and storage medium |
CN113822839B (en) * | 2020-06-18 | 2024-01-23 | 飞依诺科技股份有限公司 | Medical image processing method, medical image processing device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104376260B (en) | 2017-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104376260A (en) | Malicious code visualized analyzing method based on Shannon information entropy | |
CN110765458B (en) | Malicious software image format detection method and device based on deep learning | |
Han et al. | Malware analysis method using visualization of binary files | |
Al-Afandy et al. | High security data hiding using image cropping and LSB least significant bit steganography | |
Liu et al. | A new learning approach to malware classification using discriminative feature extraction | |
US20200285893A1 (en) | Exploit kit detection system based on the neural network using image | |
Qin et al. | Perceptual image hashing with selective sampling for salient structure features | |
Tang et al. | Robust image hash function using local color features | |
AlQadi et al. | Window Averaging Method to Create a Feature Victor for RGB Color Image | |
CN108280348B (en) | Android malicious software identification method based on RGB image mapping | |
US9478042B1 (en) | Determining visibility of rendered content | |
CN104978565B (en) | A kind of pictograph extracting method of universality | |
CN103886106B (en) | Remote sensing image safe-retrieval method based on spectral feature protection | |
Kumar et al. | Near lossless image compression using parallel fractal texture identification | |
Lai et al. | An improved block-based matching algorithm of copy-move forgery detection | |
CN112261063A (en) | Network malicious traffic detection method combined with deep hierarchical network | |
CN108399335B (en) | Malicious code visual analysis method based on local entropy | |
CN110581856A (en) | malicious code detection method and system | |
Vogt | Quantifying landscape fragmentation | |
Senthamaraikannan et al. | Real time color recognition | |
Wu et al. | Robust Camera Model Identification over Online Social Network Shared Images via Multi-Scenario Learning | |
Li et al. | Localisation of insulator strings' images based on colour filtering and texture matching | |
CN108446558B (en) | Space filling curve-based malicious code visual analysis method | |
Li et al. | Perceptual hashing for color images | |
CN114693955A (en) | Method and device for comparing image similarity and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170630 Termination date: 20191120 |