CN109033834A - A kind of malware detection method based on file association relationship - Google Patents
A kind of malware detection method based on file association relationship Download PDFInfo
- Publication number
- CN109033834A CN109033834A CN201810781731.2A CN201810781731A CN109033834A CN 109033834 A CN109033834 A CN 109033834A CN 201810781731 A CN201810781731 A CN 201810781731A CN 109033834 A CN109033834 A CN 109033834A
- Authority
- CN
- China
- Prior art keywords
- paper sample
- label
- sample
- node
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
Abstract
The invention discloses a kind of malware detection methods propagated based on label, using Coexistence as the incidence relation between paper sample, Jaccard similarity algorithm has been used to measure the similarity between paper sample, the k neighbour by choosing each paper sample constructs the incidence relation figure of paper sample as adjacent node.Label propagation algorithm is the semi-supervised learning algorithm based on figure that a kind of label information by marked node passes to unmarked node.On the basis of file association figure, learn the label information of unmarked paper sample using label propagation algorithm, finds Malware sample.
Description
Technical field
The present invention relates to a kind of malware detection method based on file association relationship, especially file unique characteristics to lack
Lose and in related information situation abundant.
Background technique
The fast development of Malware programming technique proposes huge challenge to computer and network security.Cause
This, anti-virus enterprise and market in urgent need develop new effective method and frame and protect use to cope with the viral threat of update
Family.Existing bogusware intelligent testing technology mainly using paper sample as single individual, analyzes extraction document sample
Feature, such as API Calls sequence, instruction sequence and string of binary characters etc., then maintenance data mining algorithm, such as simple pattra leaves
This, support vector machines etc. carry out viral diagnosis.However the incidence relation between paper sample implies a large amount of valuable information,
The incidence relation for ignoring them to analyze and detect Malware with certain limitation.The present invention is closed between paper sample
The angle of system is set out, and incidence relation and relationship type between paper sample are studied, and it is soft to propose the malice propagated based on label
Part detection model.
Summary of the invention
The technical problem to be solved by the present invention is to provide one kind and lack and related information feelings abundant in file unique characteristics
Under condition, a kind of malware detection method based on file association relationship.
In order to solve the above technical problems, the technical solution adopted by the present invention is that, it is constructed according to the Coexistence of paper sample
The incidence relation figure of paper sample, and predict the label of paper sample to detect Malware sample according to label propagation algorithm.
Wherein, the incidence relation figure that paper sample is constructed according to the Coexistence of paper sample, using Coexistence conduct
Incidence relation between paper sample measures the similarity between paper sample with Jaccard similarity algorithm, herein
On the basis of the incidence relation figure of paper sample is constructed as adjacent node by choosing the preceding k neighbour of each paper sample.
Preferably, the label propagation algorithm is that the label information of marked node is passed to unmarked node with pre-
Survey the label of unmarked node.
In the present invention, by the degree of coexisting between paper sample as the standard for measuring paper sample similarity, with this
Construct paper sample associated diagram.Defining file association figure is G=(V, E, W), and wherein V is the collection for representing the node of paper sample
It closes, E is the set of paper sample relationships between nodes, e (vi,vj)∈E,vi,vj∈ V indicates that there are a line connecting node viWith
vj, W is the set of each edge weight, i.e., each element is the similarity between corresponding two nodes.If CiFor node viIt is corresponding
File fiExisting terminal set.Here we measure the degree of coexisting of two files with Jaccard similarity, and formula is such as
Under:
Wherein, | C | it is the size of set C, i.e. total number of terminals existing for file.As can be seen that the value for degree of coexisting is one
A number between 0 and 1." 0 " indicates do not have Coexistence between two files, i.e., is never stored in same terminal simultaneously
On;Conversely, " 1 " then indicates that there are complete Coexistences between two files, there are certain dependences for possible two files.
In practical applications, paper sample and its co-existence information are collected from actual user client, acquisition
All paper samples and co-existence information is unrealistic, infeasible between them, therefore we only acquire suspicious file
The Coexistence of sample and marked file, which results in the loss of Coexistence between unmarked paper sample.In this chapter
In, we estimate the similarity between unmarked file using the Coexistence of unmarked file and marked file.Enable MiFor
Unmarked file fiThe set of the marked sample coexisted, then unmarked file fiAnd fjSimilarity are as follows:
In the present invention, file association relational graph is constructed using the construction strategy of k neighbour's figure, if paper sample fiIt is text
Part fjK neighbour, then connect them with a line, and the weight on side be both similarity.
Label propagation algorithm (Label Propagation Algorithm, LPA) is a kind of semi-supervised learning based on figure
Method, as shown in Figure 1, its basic thought is unmarked to predict for the label information of marked node is passed to unmarked node
The label of node, the back end of high similarity tend to belong to same category.During label transmitting, marked node
Label information according to the similarity between node pass to adjacent node or even entire figure until all unmarked node
Reach stable tag state.
Labeling algorithm is specific as follows: setting DL={ (x1,y1)...(xl,yl) it is marked data, wherein { y1...ylBe
Class label.Assuming that classification is total | C | it is known that and all classifications exist in marked data.Enable DU={ (xl+1,
yl+1)...(xl+u,yl+u) it is Unlabeled data, { yl+1...yl+uIt is unobservable.
(l+u) × C label matrix Y is defined, Y is enabledijFor node xiIt is marked as classification yiProbability.In other words, square
Battle array Y illustrates the label probability distribution of each node in figure.The energy of adjacent node is passed to measure node for label information
Power defines probability transfer matrix T,
Wherein, TijIndicate that the probability that node i is jumped to from node j, i.e. node j transmit label information to the probability of node i.
The algorithm description is as follows:
1, according to the similarity between data and weight matrix W is initialized, and the probability of spreading of calculate node j to i, is marked
Sign transfer matrix T.
2, according to the label information init Tag matrix Y of marked data;If node yiBelong to classification Cj, then Yij=1,
Otherwise YijIt is 0.
3, its label information is passed to adjacent node by each node, and wherein matrix T is obtained by matrix T by row normalization.
4, the label of marked data is reset into initial value, guarantees that original label information can be propagated correctly.It repeats
Previous step, until convergence.
5, according to formula yi=argmaxjYijLabel is distributed for unmarked node.
Label propagation algorithm need to be only oriented to using a small amount of flag data as study, utilize the internal junction of Unlabeled data
Structure and the regularity of distribution can propagate the label information of marked data and predict the label of Unlabeled data.The algorithm operating
Simply, operand is small, is suitble to the excavation and processing of large scale data information.
In conjunction with the construction method and label propagation algorithm of file association figure, it is proposed that the malice based on file association relationship
Software detection algorithm, specific algorithm process are as follows:
Detailed description of the invention
Fig. 1 is label propagation algorithm schematic diagram.
Fig. 2 is file association illustrated example.
Specific embodiment
Embodiment one: the malware detection method of the invention based on file association relationship, according to being total to for paper sample
The incidence relation figure of relationship building paper sample is deposited, and the label of paper sample is predicted to detect malice according to label propagation algorithm
Software sample.
Wherein, the incidence relation figure that paper sample is constructed according to the Coexistence of paper sample, using Coexistence conduct
Incidence relation between paper sample measures the similarity between paper sample with Jaccard similarity algorithm, herein
On the basis of the incidence relation figure of paper sample is constructed as adjacent node by choosing the preceding k neighbour of each paper sample,
As shown in Figure 2.
The label propagation algorithm is that the label information of marked node is passed to unmarked node is unmarked to predict
The label of node.
For clearly supporting paper associated diagram building process, we have chosen a simple file association relationship number here
As an example according to collection, as shown in table 1.
1 file association data library example of table
First is classified as the ID of file;Second is classified as the label of file, and " 0 " represents current file as unmarked sample, " 1 " generation
Table current file is benign file, and " -1 " represents current file as Malware;Third is classified as current file sample and malice is soft
Record coexists in part sample;4th be classified as current sample and benign paper sample record coexists;5th, which is classified as paper sample, deposits
Total number of terminals.For example, the paper sample that ID is 2, it is the unmarked sample for being present in two terminals, with ID
It is coexisted in a terminal for 4 Malware sample, the benign paper sample that with ID be 3 and ID is 6 coexists in two respectively
In terminal.According to the example given data, we select k=3 (3 neighbours for selecting each node nearest) building file association
Figure, as shown in Figure 1.It is a undirected weighted graph, and the numerical value marked by figure interior joint represents the corresponding paper sample of node
Number, similarly, numerical value in each edge is the weight on side, the i.e. similarity of two nodes.
Coexistence has used Jaccard similarity algorithm to measure file sample as the incidence relation between paper sample
Similarity between this, the k neighbour by choosing each paper sample construct the association of paper sample as adjacent node
Relational graph.Label propagation algorithm is half prison based on figure that a kind of label information by marked node passes to unmarked node
Superintend and direct learning algorithm.On the basis of file association figure, learn the label information of unmarked paper sample using label propagation algorithm,
It was found that Malware sample.
Claims (3)
1. a kind of malware detection method based on file association relationship, characterized in that according to the Coexistence of paper sample
The incidence relation figure of paper sample is constructed, and predicts the label of paper sample to detect Malware sample according to label propagation algorithm
This.
2. the malware detection method according to claim 1 based on file association relationship, which is characterized in that according to text
The incidence relation figure of the Coexistence building paper sample of part sample, is closed using Coexistence as the association between paper sample
System, the similarity between paper sample is measured with Jaccard similarity algorithm, on this basis by choosing each text
The preceding k neighbour of part sample constructs the incidence relation figure of paper sample as adjacent node.
3. the malware detection method according to claim 1 based on file association relationship, which is characterized in that the mark
Propagation algorithm is signed, is that the label information of marked node is passed to unmarked node to predict the label of unmarked node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810781731.2A CN109033834A (en) | 2018-07-17 | 2018-07-17 | A kind of malware detection method based on file association relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810781731.2A CN109033834A (en) | 2018-07-17 | 2018-07-17 | A kind of malware detection method based on file association relationship |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109033834A true CN109033834A (en) | 2018-12-18 |
Family
ID=64642870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810781731.2A Pending CN109033834A (en) | 2018-07-17 | 2018-07-17 | A kind of malware detection method based on file association relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033834A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674498A (en) * | 2019-08-20 | 2020-01-10 | 中国科学院信息工程研究所 | Internal threat detection method and system based on multi-dimensional file activity |
CN112596856A (en) * | 2020-12-22 | 2021-04-02 | 电子科技大学 | Node security prediction method based on Docker container and graph calculation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101350822A (en) * | 2008-09-08 | 2009-01-21 | 南开大学 | Method for discovering and tracing Internet malevolence code |
CN102054149A (en) * | 2009-11-06 | 2011-05-11 | 中国科学院研究生院 | Method for extracting malicious code behavior characteristic |
CN104036051A (en) * | 2014-07-04 | 2014-09-10 | 南开大学 | Database mode abstract generation method based on label propagation |
CN105975852A (en) * | 2015-12-31 | 2016-09-28 | 武汉安天信息技术有限责任公司 | Method and system for detecting sample relevance based on label propagation |
CN106355506A (en) * | 2016-08-15 | 2017-01-25 | 中南大学 | Method for selecting the initial node with maximum influence in online social network |
CN107180190A (en) * | 2016-03-11 | 2017-09-19 | 深圳先进技术研究院 | A kind of Android malware detection method and system based on composite character |
-
2018
- 2018-07-17 CN CN201810781731.2A patent/CN109033834A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101350822A (en) * | 2008-09-08 | 2009-01-21 | 南开大学 | Method for discovering and tracing Internet malevolence code |
CN102054149A (en) * | 2009-11-06 | 2011-05-11 | 中国科学院研究生院 | Method for extracting malicious code behavior characteristic |
CN104036051A (en) * | 2014-07-04 | 2014-09-10 | 南开大学 | Database mode abstract generation method based on label propagation |
CN105975852A (en) * | 2015-12-31 | 2016-09-28 | 武汉安天信息技术有限责任公司 | Method and system for detecting sample relevance based on label propagation |
CN107180190A (en) * | 2016-03-11 | 2017-09-19 | 深圳先进技术研究院 | A kind of Android malware detection method and system based on composite character |
CN106355506A (en) * | 2016-08-15 | 2017-01-25 | 中南大学 | Method for selecting the initial node with maximum influence in online social network |
Non-Patent Citations (1)
Title |
---|
张俊丽 等: "《标签传播算法理论及其应用研究综述》", 《计算机应用研究》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674498A (en) * | 2019-08-20 | 2020-01-10 | 中国科学院信息工程研究所 | Internal threat detection method and system based on multi-dimensional file activity |
CN110674498B (en) * | 2019-08-20 | 2022-06-03 | 中国科学院信息工程研究所 | Internal threat detection method and system based on multi-dimensional file activity |
CN112596856A (en) * | 2020-12-22 | 2021-04-02 | 电子科技大学 | Node security prediction method based on Docker container and graph calculation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kolosnjaji et al. | Empowering convolutional networks for malware classification and analysis | |
Hunt et al. | Learning using an artificial immune system | |
Khammassi et al. | A NSGA2-LR wrapper approach for feature selection in network intrusion detection | |
Vinayakumar et al. | Deep android malware detection and classification | |
Darem et al. | Visualization and deep-learning-based malware variant detection using OpCode-level features | |
Corona et al. | Deltaphish: Detecting phishing webpages in compromised websites | |
Halcrow et al. | Grale: Designing networks for graph learning | |
EP4058916A1 (en) | Detecting unknown malicious content in computer systems | |
Lin et al. | Using federated learning on malware classification | |
Kilgallon et al. | Improving the effectiveness and efficiency of dynamic malware analysis with machine learning | |
Yu et al. | On behavior-based detection of malware on android platform | |
Sun et al. | Malware family classification method based on static feature extraction | |
Narayanan et al. | Contextual weisfeiler-lehman graph kernel for malware detection | |
CN109033834A (en) | A kind of malware detection method based on file association relationship | |
CN110912917A (en) | Malicious URL detection method and system | |
Bohara et al. | A survey on the use of data clustering for intrusion detection system in cybersecurity | |
Gupta et al. | VIKING: adversarial attack on network embeddings via supervised network poisoning | |
Sun et al. | TDL-IDS: Towards a transfer deep learning based intrusion detection system | |
US11080236B1 (en) | High throughput embedding generation system for executable code and applications | |
Shrestha et al. | High-performance classification of phishing URLs using a multi-modal approach with MapReduce | |
Lockhart et al. | We are Still Learning About the Nature of Species and Their Evolutionary Relationships1 | |
Tarun et al. | Exploration of CNN with Node Centred Intrusion Detection Structure Plan for Green Cloud | |
Francisco et al. | Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data | |
Hou et al. | Unleash the power for tensor: A hybrid malware detection system using ensemble classifiers | |
CN106559290B (en) | The method and system of link prediction based on community structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181218 |
|
RJ01 | Rejection of invention patent application after publication |