CN109040130B

CN109040130B - Method for measuring host network behavior pattern based on attribute relation graph

Info

Publication number: CN109040130B
Application number: CN201811105929.5A
Authority: CN
Inventors: 叶晓鸣; 杨力
Original assignee: Chengdu Liming Information Technology Co ltd
Current assignee: Chengdu Liming Information Technology Co ltd
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2020-12-22
Anticipated expiration: 2038-09-21
Also published as: CN109040130A

Abstract

The invention discloses a method for measuring a host network behavior pattern based on an attribute relationship graph, which mainly comprises four key steps of network flow acquisition, formal representation of the host network behavior pattern, abnormal detection of the host network behavior and abnormal judgment. The technical scheme of the invention well solves the problems that the training data of the traditional anomaly detection method is difficult to obtain and the system adaptability of the anomaly detection behavior is poor, can completely and effectively meet the novel threat and unknown anomaly of the host and the infinite safety protection requirement of the attack layer, and reduces the loss of the target attack to enterprises and public institutions.

Description

Method for measuring host network behavior pattern based on attribute relation graph

Technical Field

The invention relates to the aspects of server behavior pattern description and anomaly detection, in particular to a host network behavior pattern measurement method based on an attribute relationship graph.

Background

The server is used as an important network hardware resource main body of an enterprise and public institution, and network application and data resources are stored in the server host. With the growth of network applications and the diversification of service types, the number of users is also increased dramatically, and a series of security problems are faced. Aiming at the research work of the system-level security detection of the host, the data sources are mostly from audit logs, system calling sequences, memory and file change conditions and the like of the host system, and normal and illegal behaviors in the host are distinguished by analyzing audit data of the system. Such studies have the advantage of easily monitoring system activities such as access behavior to sensitive files, directories, programs or ports, which are difficult to find in network traffic data. Taking the abnormal behavior of the target attack type concerned at present as an example, the final target of the intruder is data stealing, but the early stage intrusion behavior is difficult to discover. Then, after the attacker obtains the user right of the host, the data leakage behavior does not just change the internal activities of the system such as files, memories, programs and the like, and the network communication behavior is changed only when data is transmitted to the outside. Therefore, focusing on the change of the individual behavior of the host is the best time for detecting data leakage and is a key step for reducing the loss of the target attack to the enterprise and public institution.

In a large-scale network architecture, a large number of network security defense devices are usually deployed to reinforce the security protection of a host layer by layer, and a large amount of alarm information is generated every day, so that a large number of professional security analysts are required to analyze, investigate and confirm the usability of the alarm information, and particularly, for the security analysis of a specific host, the practical problems of difficulty in information collection, large amount of information, difficulty in analysis and the like also exist. In a continuously changing network system, a server has a large amount of network communication with the outside, which can cause individual behavior profiles in the network to be difficult to depict.

In summary, the server-oriented security monitoring has the following two main disadvantages:

(1) the existing research results are analyzed and found that in a large-scale network environment, various host security logs are difficult to collect and analyze, one of the main reasons is that a software agent needs to be deployed on a server, the complex work is bound to consume high cost, and the data acquisition problem caused by the inconsistency of software and hardware versions, data interfaces and access control authorities of the hosts, except for high price cost, the performance of the hosts is reduced, the complexity of security management is increased, and more risks are brought to the security management of enterprises and public institutions;

(2) the method is insensitive to the change of the network flow of the host individual, generally does not pay attention to the increase or decrease of the network flow of the host individual and the change of the behavior mode of the host individual, and cannot analyze and detect the behavior of the host individual. Therefore, the utilization of network traffic data to analyze server individual security has not received much attention.

Understanding the individual behavior characteristics of the host provides important preconditions for a plurality of research works such as network security defense, and real and effective data is provided for understanding the individual behavior characteristics of the host through measurement and analysis. Therefore, the measurement and analysis of the individual behavior of the host are the basis for researching the individual behavior, and therefore, the basic characteristics of the network behavior of the host need to be mastered, the basic rule of the individual behavior change of the host needs to be found, and a mathematical model of the individual behavior of the host needs to be constructed. Through the research on the host individual behaviors in the enterprise and public institution, the host network behavior can be accurately evaluated, and the method has very important significance for the management and control of the server network security.

Disclosure of Invention

The invention aims to provide a method for measuring a host network behavior pattern based on an attribute relation graph, which solves the problems that the prior art is difficult to deal with novel threats and unknown exceptions, and cannot adapt to the infinite requirements of complex network environment security protection of an attack layer.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the method for measuring the host network behavior pattern based on the attribute relationship graph comprises the following steps:

(1) collecting host network flow data;

(2) constructing an attribute relation graph for a host, wherein the attribute relation graph comprises a plurality of feature information arranged according to any sequence, and different values in each column of features are used as a node;

(3) establishing connection for nodes with two adjacent columns of characteristics according to network connection between hosts, wherein non-adjacent nodes cannot be connected, so that all network traffic data of the hosts correspond to the attribute relation graph;

(4) extracting characteristic values from the nodes of the attribute relational graph and the characteristics when each time window is finished, and expressing the behavior mode of the host network of the fixed time window into a baseline characteristic vector matrix with multidimensional characteristic values;

(5) calculating a host network behavior deviation value based on a fixed time window, and eliminating abnormal values to form the deviation of the host network behavior;

(6) acquiring the concentrated tendency of the deviation of the network behavior mode of each host, and setting the deviation threshold of a time window;

(7) according to the network flow data, counting multi-dimensional characteristic values of the host network behavior mode of a fixed time window of detection time, and converging the multi-dimensional characteristic values to a host layer to form a detection characteristic vector matrix;

(8) calculating a current network behavior deviation value of each monitored host based on a fixed time window of the detection time;

(9) and (4) determining whether the deviation value calculated in the step (8) is within the deviation threshold value according to the deviation threshold value of the time window set in the step (6), thereby judging whether the state of the host computer is abnormal.

Specifically, in the step (1), the manner for collecting the host network traffic data is as follows: and forwarding the host network traffic data to a server with host abnormal behavior detection deployed through the port mirror router.

Further, the attribute relationship diagram includes seven independent columns of characteristic information, and the sequence of arrangement is server IP address, protocol number, server port number, remote IP address, byte number, and time type.

Still further, in the step (3), the host network connection is divided into two modes, namely active connection and passive response, according to the communication mode of the actual scene server.

Further, in the step (2), the protocol number and the server port number are respectively 6 and 80; the number of bytes is divided into three nodes of 0, 3 and 5; the time type is divided into two nodes of 0 and 1.

In the step (4), when the number of bytes is extracted as the characteristic value, the number of bytes of the continuous value needs to be reduced by a discretization method.

Preferably, the discretization method adopts a box separation method, and specifically comprises the following steps: the byte number is divided into 12 sub-boxes from bin1 to bin12, and the byte number of each data packet in bin1 is (2)^k-1,2^k]K is an index value of the bin; the byte number of each packet in bin12 is (2)¹⁰,∞)。

Still further, the invention calculates the host network behavior deviation value by adopting the following formula:

in the formula (I), the compound is shown in the specification,

representing historyDegree of doubtness, i.e. spatial distance of profile of individual behavior of host computer at detection time from feature vector of historical behavior baseline, Mean above it_hisMean, IP, representing the baseline of historical behavior of an individual host^jDenotes a j-th host, and Td denotes a detection time; log (count)_blk) Indicating the degree of suspicion of damage, i.e. the number of hosts on IP blacklist whose detection time is in remote communication with the server, count on the IP blacklist_blkThe number of IP addresses of the bit list blacklist is represented; α and β are weighted values, and α + β is 1.

Preferably, in the step (5), the abnormal value is eliminated according to the grassbs criterion.

The main design principle of the invention is starting from the aspect of constructing individual behavior mode measurement by an attribute relational graph characteristic quantification method, expressing the normal behavior of a host with a fixed time window into a baseline characteristic vector matrix with a multidimensional characteristic value, aggregating the flow data of the fixed time window to each host according to detection time to form a detection characteristic vector matrix, quantifying the characteristic vector space deviation degree of the current network flow of each host and the normal network flow thereof according to the moving distance of the characteristic vector space, and finally judging whether the host state is normal or not according to the deviation degree.

Compared with the prior art, the invention has the following beneficial effects:

the invention introduces the concept of the attribute relational graph innovatively, constructs a host network behavior mode based on the attribute relational graph through network flow, quantifies the characteristic vector space deviation degree of each current host network flow and the normal network flow thereof by combining the moving distance of the characteristic vector space through a detection means of the individual behavior deviation degree based on the characteristic vector space movement, and detects the abnormal network behavior of the host, thereby not only fully utilizing the network flow data to analyze the individual safety of the host, but also providing a more comprehensive analysis view. Because the characteristic vector formed by the multidimensional characteristic values of the normal network behavior mode of the host can show the characteristic of stable time sequence (including the individual behavior similarity of multiple hosts at vertical time points, the individual behavior time sequence characteristic values at different time scales and the individual behavior similarity of multiple hosts at vertical time points at different time scales), when the space moving distance of the characteristic vector is seriously deviated from the baseline of the normal network behavior mode, the current unusual network security events of the host can be predicted, such as detection, scanning, malicious software injection, operation service failure, network configuration errors and the like.

In addition, when the deviation degree of the host network behavior of the detection time from the normal time host network behavior baseline is calculated, the spatial distance of the behavior mode moving relative to the historical time characteristic vector is considered, the number of the users accessing the server in an IP blacklist is also considered, and the setting of the weight alpha and the weight beta is added, so that the calculation of the deviation degree is more rigorous and comprehensive.

It should be said that, the invention accurately grasps the normal individual behavior profile of the host network behavior in a certain time window by analyzing the dynamic change rule of the host network behavior pattern feature vector, and at the same time, performs behavior analysis on the network traffic converged to the host, so that the state of the host can be accurately grasped, thereby increasing the safety protection of the host. Therefore, by adopting the scheme designed by the invention, the novel threat and unknown abnormity faced by the host computer and the infinite safety protection problem of the attack layer can be completely and effectively solved, and the loss of the target attack to enterprises and public institutions is well reduced.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a diagram of a system architecture in which the present invention may be implemented.

FIG. 3 is a diagram of the relationship of the attributes of the hosts according to the present invention.

Fig. 4 is a schematic diagram illustrating cosine similarity of individual behavior feature vectors in a vertical time window according to an embodiment of the present invention.

Fig. 5 is a schematic diagram illustrating cosine similarity of individual behavior feature vectors of vertical time windows with different time scales according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating data distribution of deviation of behavior patterns of a host network according to an embodiment of the present invention.

FIG. 7 is a box plot of abnormal deviation of host network behavior patterns in an embodiment of the present invention.

FIG. 8 is a diagram illustrating the deviation of abnormal network behavior patterns of a host according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the following description and examples, which include but are not limited to the following examples.

The existing method for detecting the abnormal features of the host computer is to collect the sample information of known attacks and abnormalities, mark the abnormal and normal samples to form a training data set, and then detect the corresponding abnormalities, wherein with the continuous updating of the attacker technology, the attack means presents diversity, complexity and secrecy, and the network attack tries to bypass the detection of the original security strategy. Therefore, the existing detection method based on the abnormal features of the host is difficult to deal with novel threats and unknown abnormalities, and cannot adapt to the safety protection requirements of complex network environments with infinite attack layers. Therefore, the invention provides a method for measuring the host network behavior pattern based on an attribute relational graph, which is used for establishing the host network behavior pattern based on the attribute relational graph through network flow and comparing the current multidimensional characteristic value with the dynamic moving distance of the characteristic vector space of a normal behavior pattern baseline to judge the abnormal condition of a host. The invention mainly aims at the important host of the enterprise and public institution to carry out the abnormity detection, and the detailed process is as follows (as shown in figure 1):

first, important host network traffic data of an enterprise and public institution is collected, in this embodiment, network traffic of a host resource is forwarded to a server deployed with host abnormal behavior detection through a port mirror router, and a system architecture thereof is shown in fig. 2. Because the main implementation mode of the invention is multidimensional characteristic extraction and deviation calculation, aggregation statistics and dimension reduction are firstly carried out on the multidimensional characteristic value of the server flow by taking a time window as a unit, a behavior pattern baseline is dynamically learned by utilizing stored historical data, and modeling and abnormal detection of a host network behavior pattern are carried out through the system after normal system parameters are obtained.

After collecting network traffic data, formal representation needs to be performed on a host network behavior mode, which specifically includes:

1. and constructing an attribute relation graph for the host, wherein the attribute relation graph comprises a plurality of feature information arranged according to any sequence, and different values in each column of features are a node. In this embodiment, the constructed attribute relationship diagram includes seven independent columns of feature information, and the sequence of arrangement is server IP address, protocol number, server port number, remote IP address, byte number, and time type. The column information of the attribute relationship diagram using seven columns of feature information is shown in table 1 below:

TABLE 1

The attribute relation graph is a description method for explaining different types of behavior patterns at an application layer, and mainly describes a network connection mode between a certain host and other hosts. In this embodiment, a 7-tuple, that is, { server IP address, protocol number, server port number, remote IP address, byte number, and time type }, is used to describe various applications of the server transport layer, such as FTP, P2P, Mail, Web, DNS server, and network attack in a visual manner. Fig. 3 shows an attribute relationship diagram of a host, in this embodiment, a protocol number and a server port number are respectively 6 and 80; the number of bytes is divided into three nodes of 0, 3 and 5; the time type is divided into two nodes of 0 and 1. The host provides service to the external open 80 port, other user hosts can carry out network communication with the IP address of the host, the user hosts initiate network connection, request the network service of the 80 port, and communicate with the 80 port of the host by using a plurality of source port numbers during communication.

In addition, it should be noted that, because the network behavior profile attributes of the host proposed in the research of the host are relatively single and limited, only the traffic attributes (such as the number of bytes sent, the number of packets received, the number of packets sent, the number of bytes received, and the duration) and the host connection (the number of streams connected to the host) may be involved, and the quantization of the relationship between the application layer attributes (the number of connections to a certain port and the relationship between the host port and the remote port) is rarely involved. Therefore, the invention introduces the joint attribute relationship of the attribute relationship diagram analysis host individual behavior, and provides a method for quantifying the characteristics of the 7-column attribute relationship diagram, wherein the 5 th column destination address (remote IP address) is the key.

2. And establishing edges for the nodes with the two adjacent columns of characteristics according to the network connection of the host, so that all network flow data of the host correspond to the attribute relation graph. The invention carries out host-level attribute expansion on an attribute relational graph method, the number of columns in the relational graph is determined by the number of the adopted tuples, each tuple is positioned in one column in the graph, nodes of two adjacent columns have a connection relation, and nodes of non-adjacent columns have no connection relation. In addition, the invention defines the number of nodes with the degree of in (degree of out) being the left side (right side) of each column of nodes for the relation between the quantization element groups so as to characterize the network behavior of the host computer.

3. And at the end of each time window, extracting characteristic values from the nodes of the attribute relation graph and the characteristics, constructing a profile of the normal network behavior mode of the host, and expressing the network behavior mode of the host with a fixed time window into a baseline characteristic vector matrix with multidimensional characteristic values. It should be noted that, in the present invention, the number and the order of features may be changed, and different numbers and orders of features may be different, but all the extracted features may be represented as a baseline eigenvector matrix with multidimensional eigenvalues, and subsequent steps may not be changed.

When each time window is finished, the characteristic values of the host network behavior mode are uniformly extracted. The data of a fixed time window T is gathered to a host layer, and the host is used as a research object to carry out characteristic statistics, so that on one hand, the host is described by considering information as much as possible; on the other hand, in order to keep the feature space as small as possible for the algorithm analysis processing and interpretation, the feature set also needs to be subjected to data preprocessing, reduction and feature dimension reduction. By converging traffic to a host level, based on the original 9 attributes of network stream data, the stream-based features are extended by adopting an attribute relationship graph feature measurement method, wherein the number of the network stream features is 33, the attribute features of the behaviors of a server and a client are respectively 70, and part of the features are shown in the following table 2.

TABLE 2

In addition, the present invention introduces an attribute quantization attribute, which is to extract features from an attribute map (as shown in table 3), and here, in order to use the number of bytes as features, the number of bytes of continuous values needs to be reduced by a discretization method. The discretization method in this embodiment adopts a binning method, which divides the number of bytes into 12 bins from bin1 to bin 12. The value setting of the binning is derived from the Maximum Transmission Unit (MTU), wherein the MSS maximum segment length is MTU-IP packet header length (20bytes) -TCP packet header length (20 bytes). The number of bytes per packet of the 1 st bin1 is (2)^k-1,2^k]Where k is the index value of the bin and the number of bytes per packet of the last bin12 is (2)¹⁰,∞)。

TABLE 3

For the host network connection, the invention distinguishes two cases (as described and shown in table 4) of active connection initiation and passive response according to the communication mode of the actual scene server, and increases the number of bytes at the same time in order to depict the individual behavior of the host.

TABLE 4

After the host network behavior pattern is formally characterized, the normal network behavior deviation value of the host needs to be calculated based on a fixed time window, and the abnormal value is removed (according to the Grabbs criterion), so that the deviation degree of the host network behavior pattern is formed.

The feature vector of the normal network behavior mode is a multidimensional feature value updated along with the actual network flow change of the server, and can show the characteristic of stable time sequence, and the verification experiment designed by the invention is as follows:

(1) stable characteristic of different host network behavior modes in same time and same time window

In the experiment, individual behavior characteristic vectors of a plurality of hosts at the same time and in the same time window are randomly extracted, and cosine similarity is calculated. In the experiment, 6 hosts are randomly selected, the time window is 5 minutes, the data is the feature set provided by the invention, the data of nine am and fifty minutes of 19 days of working day is randomly selected, the similarity is calculated in pairs, and the total 171 (19 x 18/2) cosine similarity values are obtained. The experimental results show that the behavior patterns of each host are similar in the same time and the same time window, and the data results show stability, as shown in fig. 4, the similarity values are all higher than 0.994.

(2) Stable behavior pattern of network behavior of each host in different time windows at the same time

In the experiment, time windows are randomly selected and 60 minutes, 30 minutes, 10 minutes, 5 minutes and 1 minute are respectively selected, the average similarity of 6 hosts is statistically and randomly extracted, data of nine o' clock and fifty minutes in the morning of 17-day working day is vertically and randomly selected, the similarity is calculated in pairs, and 136 cosine similarity values in total are obtained. As shown in fig. 5, the similarity values for the time windows are all above 0.994, the 10 minute time window is the lowest, and the remaining 60, 1, and 5 minutes are all above 0.999. The experimental result shows that the multidimensional characteristic values of the behavior patterns of the hosts at the same time and in the same time window are similar, and the data result display has stability.

The experiments show that the multidimensional characteristic values of the host network behavior mode show high similarity aiming at different time windows and different hosts, and the deviation condition of the host network behavior mode can be calculated by utilizing the characteristic of stable time sequence. The practical data experiment analysis result shows that the server provides Web, mail, information management and other services, the social characteristics, the functional characteristics and the application characteristics of the server and the service use habits of fixed users in enterprises and public institutions are considered, the server is in a stable state for a long time, and the network behavior mode of the host computer cannot be mutated.

After the host network behavior mode of a fixed time window is expressed into a baseline eigenvector matrix with multidimensional eigenvalues, the deviation degree concentration trend of the host network behavior mode is obtained, and the deviation degree threshold value of the time window is set. Then, according to the network flow data, the multi-dimensional characteristic values of the host network behavior mode of the fixed time window of the detection time are counted and converged to the host layer to form a detection characteristic vector matrix.

Next, a current network behavior deviation value of each monitored host is calculated based on the fixed time window of the detection time. And finally, determining whether the calculated current behavior deviation value is within the deviation threshold value according to the set deviation threshold value of the time window (in practical application, the threshold value can be dynamically updated in an iterative manner), namely determining whether the host state is abnormal. That is, a host is considered normal when the host's network behavior profile at the time of detection is close to or equal to the historical network behavior profile, and vice versa.

In the above description, for the calculation of the deviation degree, besides the spatial distance of the host network behavior pattern moving relative to the feature vector of the historical time, the present invention also considers the quantitative index of the suspiciousness degree, because the more the number of the users accessing the server in the blacklist is, the higher the probability of being implanted with the malicious software or the probability of having the malicious behavior is.

This definition of the suspicion degree is set forth in detail below.

Definition 1: degree of historical suspicion S₁I.e. the spatial distance of the feature vector of the individual behavior profile of the host from the historical behavior baseline at the detection time.

And quantifying the feature vector space position movement of the host and the historical individual behavior profile. And monitoring the movement condition of the host in the feature space during detection, calculating the distance between the individual behavior and the center of the feature space of the historical behavior, and identifying whether the space position of the feature vector of the host is abnormal or not.

It can be determined that the user has a fixed habit and law of accessing the network resource, so that the server of the data center provides a relatively stable application service to the outside. Therefore, the time evolution process of the network behavior profile can be represented as a network communication mode with stability. Meanwhile, considering that the mean value has the function of measuring the data trend, each mean value of the historical data at the time is obtained as a feature vector datum point (note: the time marked as abnormal is filtered when the features of the historical time points are extracted, the same is carried out later), the relative offset of the current feature vector is calculated as a suspicious value, and the mathematical calculation method is shown as a formula (1):

in the formula, Mean_hisMean, IP, representing the baseline of historical behavior of an individual host^jDenotes a j-th host, and Td denotes a detection time.

Definition 2: suspicion of damage S₂That is, the number of hosts in the IP blacklist whose detection time is in remote communication with the server is obtained by analyzing information such as a malicious domain name and an IP blacklist.

And quantifying the possibility that the server host becomes a victim host, and calculating the number of the blacklists of the IP address bits of the remote hosts in communication by adopting a publicly issued blacklist.

At present, many organizations and companies issue blacklists of IP addresses and domain names, and the access to important network facilities is also an effective security defense means by limiting the IP on the blacklists according to the information. By finding out which hosts access malicious domain names or IP addresses, the method can help to quickly lock the botnet main control end and the victim host, and can reduce the influence caused by the botnet. Thus, the greater the number of blacklists in the individual behavior of the server host, the higher the suspicion degree. The invention adopts data issued by China scientific and technical university, northeast university and Germany free blacklist library as basis, and counts whether IP interacted with the monitoring host in corresponding time period belongs to the open blacklist, if the interacted blacklist IP is more, the probability that the infected malicious software and the attacked host are called as the damaged host is higher, the mathematical calculation method is as shown in formula (2):

s₂＝log(count_blk) (2)

in the formula, count_blkIndicating the number of IP addresses in the bit list blacklist.

Definition 3: the individual behavior suspicion score is the cumulative deviation of the individual behavior from the behavior baseline.

The abnormal detection of the individual behaviors of the host can be realized by utilizing the difference and the similarity of the individual behaviors. An attacker implants malicious software by utilizing hardware and software bugs of a host, then infects and attacks other hosts by utilizing the network identity of the victim host, and the individual behavior of the victim host is different from the normal behavior; when a host infected with malicious software launches an attack and infects other host behaviors, the network communication behavior of the paired host is necessarily different from the normal communication behavior; in order to cover the victim host and confuse the real attack target, an attacker must have multiple victim hosts in the network, and these hosts must have many similarities, such as operating systems, browsers, etc. with the same security vulnerability, or the victim hosts cooperate with each other and have a common attack target, so that these hosts have more commonalities and less differences.

In summary, the host suspicion score value includes s₁(formula 1) s₂(formula 2), normalizing the three values, setting weights { α, β }, and α + β ═ 1, and summarizing the mathematical calculation method for obtaining the individual behavior suspiciousness as shown in formula (3), that is, the formula according to which the deviation value is calculated:

the following provides a data distribution experiment of the deviation degree of the host network behavior pattern.

Randomly selecting a feature vector of a host network behavior pattern protected by an enterprise and public institution at 11 pm in a certain day in the data set, and counting the data distribution condition according to a deviation degree calculation method of the behavior pattern. Fig. 6 shows the distribution of the deviation data of the host network behavior pattern, and identifies the statistical information such as the maximum value, the minimum value, the mean value, the mode, and the like.

Experimental data show that the number of server hosts with the abnormal deviation of the host network behavior pattern actually monitored in the current network environment being lower than 4 accounts for more than 98.65%, and similarly, data of 17 days are randomly extracted, and results show that the deviation values of the network behavior patterns of most hosts are basically stable. The deviation degrees of the normal host network behavior modes in other time periods are observed through a plurality of experiments, and the data distribution of the deviation degrees is concentrated in a relatively fixed range, so that a threshold value can be set according to the deviation degrees for abnormal judgment.

Fig. 7 and 8 show the detection experiment condition of the host network behavior pattern abnormality. Firstly, network flow data is adopted to count the multidimensional characteristic value of a server behavior mode in a fixed time window. And then calculating deviation values based on the time window, and eliminating abnormal values through the Grabas criterion to form the deviation of the host network behavior pattern of one day, as shown in FIG. 7.

And then, acquiring the deviation concentration trend of the network behavior mode of each host, and setting a deviation threshold of a time window.

Then, network flow data is adopted, multidimensional characteristic values of the network behavior patterns of the hosts in the fixed time window of the detection time are counted, and then deviation values of the monitored hosts are calculated based on the time window.

And finally, judging whether the state of the server host is normal or abnormal according to the threshold of the time window deviation degree of the host network behavior mode calculated in the prior art.

The data shown in fig. 8 shows a time sequence variation curve of the abnormal deviation of the abnormal network behavior pattern of the host for six days, and visually shows the complete variation process of the deviation of the network behavior pattern of the host from normal to abnormal and then to normal. The dashed line in the figure represents the median of the overall data distribution, which corresponds to the normal range of deviation in the figure. The invention detects that the abnormality is that the host displays an unusual behavior pattern after being implanted with malicious software, the network behavior does not influence the normal operation of the network, and the abnormality is mainly expressed in that the host tries to connect with a remote host all the time, and a malicious behavior of data leakage is developed in the later period, so that the fact that the host network behavior pattern deviates from the normal behavior is found accurately in time before serious network harm is caused is particularly important.

In conclusion, the invention can effectively meet the safety protection requirements of novel threats, unknown anomalies and complex network environments which cannot adapt to the attack layer in a variable manner through reasonable scheme design, and solves the problems that the training data of the traditional anomaly detection method is difficult to obtain and the adaptability of an abnormal behavior detection system is poor. The scheme designed by the invention not only well accords with the trend of scientific and technological development and realizes important innovation, but also has very important significance for the network security management and control of the server. Therefore, compared with the prior art, the invention has outstanding substantive features and remarkable progress.

The above-mentioned embodiment is only one of the preferred embodiments of the present invention, and should not be used to limit the scope of the present invention, but all the insubstantial modifications or changes made within the spirit and scope of the main design of the present invention, which still solve the technical problems consistent with the present invention, should be included in the scope of the present invention.

Claims

1. A method for measuring host network behavior pattern based on attribute relationship diagram is characterized by comprising the following steps:

(1) collecting host network flow data;

(2) constructing an attribute relation graph for a host, wherein the attribute relation graph comprises seven independent columns of characteristic information, the arrangement sequence comprises a server IP address, a protocol number, a server port number, a remote IP address, byte number and a time type in turn, and different values in each column of characteristics are used as a node;

(5) calculating a host network behavior deviation value based on a fixed time window, and eliminating abnormal values to form the deviation of the host network behavior; in the step, an abnormal value is removed according to a Grabbs criterion, and a host network behavior deviation value is calculated by adopting the following formula:

in the formula (I), the compound is shown in the specification,

representing historical suspiciousness, i.e. the spatial distance of the feature vector of the profile of the individual behavior of the host computer at the detection time from the baseline of the historical behavior, Mean above it_hisMean, IP, representing the baseline of historical behavior of an individual host^jDenotes a j-th host, and Td denotes a detection time; log (count)_blk) Indicating the degree of suspicion of damage, i.e. the number of hosts on IP blacklist whose detection time is in remote communication with the server, count on the IP blacklist_blkThe number of IP addresses of the bit list blacklist is represented; alpha and beta are both weight values, and alpha + beta is 1;

(8) based on the fixed time window of the detection time, calculating the current network behavior deviation value of each monitored host by using the formula in the step (5);

2. The method according to claim 1, wherein in step (1), the manner for collecting the host network traffic data is as follows: and forwarding the host network traffic data to a server with host abnormal behavior detection deployed through the port mirror router.

3. The method for measuring host network behavior pattern based on attribute relationship graph according to claim 1 or 2, wherein in the step (3), the host network connection is divided into two modes of active connection and passive response according to the communication mode of the actual scene server.

4. The method according to claim 3, wherein in the step (2), the protocol number and the server port number are respectively 6 and 80; the number of bytes is divided into three nodes of 0, 3 and 5; the time type is divided into two nodes of 0 and 1.

5. The method according to claim 4, wherein in step (4), when extracting the number of bytes as the characteristic value, the number of bytes of the continuous value is reduced by discretizing the number of bytes of the data value.

6. The method for measuring the host network behavior pattern based on the attribute relationship diagram according to claim 5, wherein the discretization method adopts a binning method, specifically: the byte number is divided into 12 sub-boxes from bin1 to bin12, and the byte number of each data packet in bin1 is (2)^k-1,2^k]K is an index value of the bin; the byte number of each packet in bin12 is (2)¹⁰,∞)。