CN109033834A - A kind of malware detection method based on file association relationship - Google Patents

A kind of malware detection method based on file association relationship Download PDF

Info

Publication number
CN109033834A
CN109033834A CN201810781731.2A CN201810781731A CN109033834A CN 109033834 A CN109033834 A CN 109033834A CN 201810781731 A CN201810781731 A CN 201810781731A CN 109033834 A CN109033834 A CN 109033834A
Authority
CN
China
Prior art keywords
paper sample
label
sample
node
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810781731.2A
Other languages
Chinese (zh)
Inventor
倪震
倪铭
夏彬
刘晓迁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nupt Institute Of Big Data Research At Yancheng Co Ltd
Original Assignee
Nupt Institute Of Big Data Research At Yancheng Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nupt Institute Of Big Data Research At Yancheng Co Ltd filed Critical Nupt Institute Of Big Data Research At Yancheng Co Ltd
Priority to CN201810781731.2A priority Critical patent/CN109033834A/en
Publication of CN109033834A publication Critical patent/CN109033834A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Abstract

The invention discloses a kind of malware detection methods propagated based on label, using Coexistence as the incidence relation between paper sample, Jaccard similarity algorithm has been used to measure the similarity between paper sample, the k neighbour by choosing each paper sample constructs the incidence relation figure of paper sample as adjacent node.Label propagation algorithm is the semi-supervised learning algorithm based on figure that a kind of label information by marked node passes to unmarked node.On the basis of file association figure, learn the label information of unmarked paper sample using label propagation algorithm, finds Malware sample.

Description

A kind of malware detection method based on file association relationship
Technical field
The present invention relates to a kind of malware detection method based on file association relationship, especially file unique characteristics to lack Lose and in related information situation abundant.
Background technique
The fast development of Malware programming technique proposes huge challenge to computer and network security.Cause This, anti-virus enterprise and market in urgent need develop new effective method and frame and protect use to cope with the viral threat of update Family.Existing bogusware intelligent testing technology mainly using paper sample as single individual, analyzes extraction document sample Feature, such as API Calls sequence, instruction sequence and string of binary characters etc., then maintenance data mining algorithm, such as simple pattra leaves This, support vector machines etc. carry out viral diagnosis.However the incidence relation between paper sample implies a large amount of valuable information, The incidence relation for ignoring them to analyze and detect Malware with certain limitation.The present invention is closed between paper sample The angle of system is set out, and incidence relation and relationship type between paper sample are studied, and it is soft to propose the malice propagated based on label Part detection model.
Summary of the invention
The technical problem to be solved by the present invention is to provide one kind and lack and related information feelings abundant in file unique characteristics Under condition, a kind of malware detection method based on file association relationship.
In order to solve the above technical problems, the technical solution adopted by the present invention is that, it is constructed according to the Coexistence of paper sample The incidence relation figure of paper sample, and predict the label of paper sample to detect Malware sample according to label propagation algorithm.
Wherein, the incidence relation figure that paper sample is constructed according to the Coexistence of paper sample, using Coexistence conduct Incidence relation between paper sample measures the similarity between paper sample with Jaccard similarity algorithm, herein On the basis of the incidence relation figure of paper sample is constructed as adjacent node by choosing the preceding k neighbour of each paper sample.
Preferably, the label propagation algorithm is that the label information of marked node is passed to unmarked node with pre- Survey the label of unmarked node.
In the present invention, by the degree of coexisting between paper sample as the standard for measuring paper sample similarity, with this Construct paper sample associated diagram.Defining file association figure is G=(V, E, W), and wherein V is the collection for representing the node of paper sample It closes, E is the set of paper sample relationships between nodes, e (vi,vj)∈E,vi,vj∈ V indicates that there are a line connecting node viWith vj, W is the set of each edge weight, i.e., each element is the similarity between corresponding two nodes.If CiFor node viIt is corresponding File fiExisting terminal set.Here we measure the degree of coexisting of two files with Jaccard similarity, and formula is such as Under:
Wherein, | C | it is the size of set C, i.e. total number of terminals existing for file.As can be seen that the value for degree of coexisting is one A number between 0 and 1." 0 " indicates do not have Coexistence between two files, i.e., is never stored in same terminal simultaneously On;Conversely, " 1 " then indicates that there are complete Coexistences between two files, there are certain dependences for possible two files.
In practical applications, paper sample and its co-existence information are collected from actual user client, acquisition All paper samples and co-existence information is unrealistic, infeasible between them, therefore we only acquire suspicious file The Coexistence of sample and marked file, which results in the loss of Coexistence between unmarked paper sample.In this chapter In, we estimate the similarity between unmarked file using the Coexistence of unmarked file and marked file.Enable MiFor Unmarked file fiThe set of the marked sample coexisted, then unmarked file fiAnd fjSimilarity are as follows:
In the present invention, file association relational graph is constructed using the construction strategy of k neighbour's figure, if paper sample fiIt is text Part fjK neighbour, then connect them with a line, and the weight on side be both similarity.
Label propagation algorithm (Label Propagation Algorithm, LPA) is a kind of semi-supervised learning based on figure Method, as shown in Figure 1, its basic thought is unmarked to predict for the label information of marked node is passed to unmarked node The label of node, the back end of high similarity tend to belong to same category.During label transmitting, marked node Label information according to the similarity between node pass to adjacent node or even entire figure until all unmarked node Reach stable tag state.
Labeling algorithm is specific as follows: setting DL={ (x1,y1)...(xl,yl) it is marked data, wherein { y1...ylBe Class label.Assuming that classification is total | C | it is known that and all classifications exist in marked data.Enable DU={ (xl+1, yl+1)...(xl+u,yl+u) it is Unlabeled data, { yl+1...yl+uIt is unobservable.
(l+u) × C label matrix Y is defined, Y is enabledijFor node xiIt is marked as classification yiProbability.In other words, square Battle array Y illustrates the label probability distribution of each node in figure.The energy of adjacent node is passed to measure node for label information Power defines probability transfer matrix T,
Wherein, TijIndicate that the probability that node i is jumped to from node j, i.e. node j transmit label information to the probability of node i. The algorithm description is as follows:
1, according to the similarity between data and weight matrix W is initialized, and the probability of spreading of calculate node j to i, is marked Sign transfer matrix T.
2, according to the label information init Tag matrix Y of marked data;If node yiBelong to classification Cj, then Yij=1, Otherwise YijIt is 0.
3, its label information is passed to adjacent node by each node, and wherein matrix T is obtained by matrix T by row normalization.
4, the label of marked data is reset into initial value, guarantees that original label information can be propagated correctly.It repeats Previous step, until convergence.
5, according to formula yi=argmaxjYijLabel is distributed for unmarked node.
Label propagation algorithm need to be only oriented to using a small amount of flag data as study, utilize the internal junction of Unlabeled data Structure and the regularity of distribution can propagate the label information of marked data and predict the label of Unlabeled data.The algorithm operating Simply, operand is small, is suitble to the excavation and processing of large scale data information.
In conjunction with the construction method and label propagation algorithm of file association figure, it is proposed that the malice based on file association relationship Software detection algorithm, specific algorithm process are as follows:
Detailed description of the invention
Fig. 1 is label propagation algorithm schematic diagram.
Fig. 2 is file association illustrated example.
Specific embodiment
Embodiment one: the malware detection method of the invention based on file association relationship, according to being total to for paper sample The incidence relation figure of relationship building paper sample is deposited, and the label of paper sample is predicted to detect malice according to label propagation algorithm Software sample.
Wherein, the incidence relation figure that paper sample is constructed according to the Coexistence of paper sample, using Coexistence conduct Incidence relation between paper sample measures the similarity between paper sample with Jaccard similarity algorithm, herein On the basis of the incidence relation figure of paper sample is constructed as adjacent node by choosing the preceding k neighbour of each paper sample, As shown in Figure 2.
The label propagation algorithm is that the label information of marked node is passed to unmarked node is unmarked to predict The label of node.
For clearly supporting paper associated diagram building process, we have chosen a simple file association relationship number here As an example according to collection, as shown in table 1.
1 file association data library example of table
First is classified as the ID of file;Second is classified as the label of file, and " 0 " represents current file as unmarked sample, " 1 " generation Table current file is benign file, and " -1 " represents current file as Malware;Third is classified as current file sample and malice is soft Record coexists in part sample;4th be classified as current sample and benign paper sample record coexists;5th, which is classified as paper sample, deposits Total number of terminals.For example, the paper sample that ID is 2, it is the unmarked sample for being present in two terminals, with ID It is coexisted in a terminal for 4 Malware sample, the benign paper sample that with ID be 3 and ID is 6 coexists in two respectively In terminal.According to the example given data, we select k=3 (3 neighbours for selecting each node nearest) building file association Figure, as shown in Figure 1.It is a undirected weighted graph, and the numerical value marked by figure interior joint represents the corresponding paper sample of node Number, similarly, numerical value in each edge is the weight on side, the i.e. similarity of two nodes.
Coexistence has used Jaccard similarity algorithm to measure file sample as the incidence relation between paper sample Similarity between this, the k neighbour by choosing each paper sample construct the association of paper sample as adjacent node Relational graph.Label propagation algorithm is half prison based on figure that a kind of label information by marked node passes to unmarked node Superintend and direct learning algorithm.On the basis of file association figure, learn the label information of unmarked paper sample using label propagation algorithm, It was found that Malware sample.

Claims (3)

1. a kind of malware detection method based on file association relationship, characterized in that according to the Coexistence of paper sample The incidence relation figure of paper sample is constructed, and predicts the label of paper sample to detect Malware sample according to label propagation algorithm This.
2. the malware detection method according to claim 1 based on file association relationship, which is characterized in that according to text The incidence relation figure of the Coexistence building paper sample of part sample, is closed using Coexistence as the association between paper sample System, the similarity between paper sample is measured with Jaccard similarity algorithm, on this basis by choosing each text The preceding k neighbour of part sample constructs the incidence relation figure of paper sample as adjacent node.
3. the malware detection method according to claim 1 based on file association relationship, which is characterized in that the mark Propagation algorithm is signed, is that the label information of marked node is passed to unmarked node to predict the label of unmarked node.
CN201810781731.2A 2018-07-17 2018-07-17 A kind of malware detection method based on file association relationship Pending CN109033834A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810781731.2A CN109033834A (en) 2018-07-17 2018-07-17 A kind of malware detection method based on file association relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810781731.2A CN109033834A (en) 2018-07-17 2018-07-17 A kind of malware detection method based on file association relationship

Publications (1)

Publication Number Publication Date
CN109033834A true CN109033834A (en) 2018-12-18

Family

ID=64642870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810781731.2A Pending CN109033834A (en) 2018-07-17 2018-07-17 A kind of malware detection method based on file association relationship

Country Status (1)

Country Link
CN (1) CN109033834A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674498A (en) * 2019-08-20 2020-01-10 中国科学院信息工程研究所 Internal threat detection method and system based on multi-dimensional file activity
CN112596856A (en) * 2020-12-22 2021-04-02 电子科技大学 Node security prediction method based on Docker container and graph calculation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350822A (en) * 2008-09-08 2009-01-21 南开大学 Method for discovering and tracing Internet malevolence code
CN102054149A (en) * 2009-11-06 2011-05-11 中国科学院研究生院 Method for extracting malicious code behavior characteristic
CN104036051A (en) * 2014-07-04 2014-09-10 南开大学 Database mode abstract generation method based on label propagation
CN105975852A (en) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 Method and system for detecting sample relevance based on label propagation
CN106355506A (en) * 2016-08-15 2017-01-25 中南大学 Method for selecting the initial node with maximum influence in online social network
CN107180190A (en) * 2016-03-11 2017-09-19 深圳先进技术研究院 A kind of Android malware detection method and system based on composite character

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350822A (en) * 2008-09-08 2009-01-21 南开大学 Method for discovering and tracing Internet malevolence code
CN102054149A (en) * 2009-11-06 2011-05-11 中国科学院研究生院 Method for extracting malicious code behavior characteristic
CN104036051A (en) * 2014-07-04 2014-09-10 南开大学 Database mode abstract generation method based on label propagation
CN105975852A (en) * 2015-12-31 2016-09-28 武汉安天信息技术有限责任公司 Method and system for detecting sample relevance based on label propagation
CN107180190A (en) * 2016-03-11 2017-09-19 深圳先进技术研究院 A kind of Android malware detection method and system based on composite character
CN106355506A (en) * 2016-08-15 2017-01-25 中南大学 Method for selecting the initial node with maximum influence in online social network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张俊丽 等: "《标签传播算法理论及其应用研究综述》", 《计算机应用研究》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674498A (en) * 2019-08-20 2020-01-10 中国科学院信息工程研究所 Internal threat detection method and system based on multi-dimensional file activity
CN110674498B (en) * 2019-08-20 2022-06-03 中国科学院信息工程研究所 Internal threat detection method and system based on multi-dimensional file activity
CN112596856A (en) * 2020-12-22 2021-04-02 电子科技大学 Node security prediction method based on Docker container and graph calculation

Similar Documents

Publication Publication Date Title
Kolosnjaji et al. Empowering convolutional networks for malware classification and analysis
Hunt et al. Learning using an artificial immune system
Khammassi et al. A NSGA2-LR wrapper approach for feature selection in network intrusion detection
Vinayakumar et al. Deep android malware detection and classification
Darem et al. Visualization and deep-learning-based malware variant detection using OpCode-level features
Corona et al. Deltaphish: Detecting phishing webpages in compromised websites
Halcrow et al. Grale: Designing networks for graph learning
EP4058916A1 (en) Detecting unknown malicious content in computer systems
Lin et al. Using federated learning on malware classification
Kilgallon et al. Improving the effectiveness and efficiency of dynamic malware analysis with machine learning
Yu et al. On behavior-based detection of malware on android platform
Sun et al. Malware family classification method based on static feature extraction
Narayanan et al. Contextual weisfeiler-lehman graph kernel for malware detection
CN109033834A (en) A kind of malware detection method based on file association relationship
CN110912917A (en) Malicious URL detection method and system
Bohara et al. A survey on the use of data clustering for intrusion detection system in cybersecurity
Gupta et al. VIKING: adversarial attack on network embeddings via supervised network poisoning
Sun et al. TDL-IDS: Towards a transfer deep learning based intrusion detection system
US11080236B1 (en) High throughput embedding generation system for executable code and applications
Shrestha et al. High-performance classification of phishing URLs using a multi-modal approach with MapReduce
Lockhart et al. We are Still Learning About the Nature of Species and Their Evolutionary Relationships1
Tarun et al. Exploration of CNN with Node Centred Intrusion Detection Structure Plan for Green Cloud
Francisco et al. Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data
Hou et al. Unleash the power for tensor: A hybrid malware detection system using ensemble classifiers
CN106559290B (en) The method and system of link prediction based on community structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181218

RJ01 Rejection of invention patent application after publication