CN114499987A - Network abnormal IP and port hybrid detection method based on relative density - Google Patents

Network abnormal IP and port hybrid detection method based on relative density Download PDF

Info

Publication number
CN114499987A
CN114499987A CN202111644457.2A CN202111644457A CN114499987A CN 114499987 A CN114499987 A CN 114499987A CN 202111644457 A CN202111644457 A CN 202111644457A CN 114499987 A CN114499987 A CN 114499987A
Authority
CN
China
Prior art keywords
destination
address
port
source
relative density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111644457.2A
Other languages
Chinese (zh)
Inventor
杭菲璐
罗震宇
郭威
陈何雄
毛正雄
何映军
谢林江
张振红
白晓羽
占梦来
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Center of Yunnan Power Grid Co Ltd
Original Assignee
Information Center of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Center of Yunnan Power Grid Co Ltd filed Critical Information Center of Yunnan Power Grid Co Ltd
Priority to CN202111644457.2A priority Critical patent/CN114499987A/en
Publication of CN114499987A publication Critical patent/CN114499987A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a relative density-based network abnormal IP and port hybrid detection method, which comprises the steps of firstly extracting required fields from a data source; extracting all non-repeated source MAC addresses; extracting all non-duplicate (destination IP addresses, destination ports) based on each source MAC address; then, extracting three characteristics of access times, the number of different source IPs and the peak value of the access number in unit time; after the characteristics are standardized, k nearest neighbors of each (target IP address and target port) are calculated by adopting Euclidean distance; calculating the relative density and abnormal score of each (destination IP address and destination port); an anomaly score threshold is defined and (destination IP address, destination port) higher than the threshold is marked. The invention carries out mixed anomaly detection on the target IP address and the target port without independent consideration, has higher accuracy and interpretability, and is easy to popularize and apply.

Description

Network abnormal IP and port hybrid detection method based on relative density
Technical Field
The invention belongs to the technical field of network anomaly detection, and particularly relates to a relative density-based network anomaly IP and port hybrid detection method.
Background
In this age today, life without the internet is almost impossible. The potential of the internet is enormous, and its growth has been reflected in various fields such as education, entertainment, and medical care. However, the use of the internet in every area of human life has its own challenges. The most important challenge relates to data security. Network intrusion is a security breach that occurs due to unauthorized access in a computing network. The process of identifying different types of intrusions in a network is performed by an Intrusion Detection System (IDS). Attacks in IDS can be classified as probe attacks, DoS attacks, R2L attacks, U2R attacks.
In a probe attack, an unauthorized person would "sniff" the network and identify a vulnerability in a particular target resource, e.g., an attacker could use an unusual port number as an identification to attack a different port than other IP addresses.
DoS is a short term for Denial of Service, i.e., Denial of Service, and the attack behavior of DoS is called DoS attack, which aims to make a computer or a network unable to provide normal services. The most common DoS attacks are computer network broadband attacks and connectivity attacks. DoS attacks refer to intentional defects in the implementation of attacking network protocols or the direct and brute force approach of exhausting the resources of the attacked objects, which aims to make the target computer or network unable to provide normal service or resource access and make the target system service system stop responding or even crash, and in such attacks, intrusion into the target server or target network device is not included. These service resources include network bandwidth, file system space capacity, open processes or allowed connections. Such attacks can result in resource scarcity, and the consequences of such attacks cannot be avoided no matter how fast the processing speed of the computer is, how large the memory capacity is, and how fast the network bandwidth is.
In an R2L attack, an attacker may not have authorized access to the victim's network, thereby more easily sniffing data, which may be prevented by a Virtual Private Network (VPN) framework.
The U2R attack is an attack in which an attacker can gain root-source privileges to access a network. This can lead to several disasters, such as obtaining unauthorized access to control lists. Where the destination IP and port are used as information to identify the host and application, their anomaly detection is of particular importance.
At present, two algorithms of network IP anomaly detection based on firewall rules and DNS request data stream source IP anomaly detection based on relative density are mainly adopted.
The network IP anomaly detection algorithm based on the firewall rules specifically comprises the following steps: whether the firewall allows or forbids the data packet entering the internal network is judged according to the firewall security policy from the configuration file of the firewall. The firewall security policy is an access control list having a plurality of rows, each row being a firewall rule. The firewall rules consist of three parts: rule sequence number, network field filter field, and action. The sequence number of the rule indicates the sequence of the rule in the firewall security policy rule table, so that the sequence of the matching operation of the data packet on the rule is ensured. The size of the sequence number determines the priority of the rule, the smaller the sequence number, the higher the priority of the rule. There may be many items for the network field filtering domain, but the following five items are commonly used in firewall rules: protocol, source IP address, source port, destination IP address, and destination port. The regular actions refer to the handling of packets, and are usually two options, namely, accepting and prohibiting packets from passing through the firewall. If the packet does not find a matching rule, it will eventually match a default filtering rule. The firewall rule table is a basis for judging the data packet entering the internal network by the firewall, and is a concrete embodiment of the network security policy, and related managers need to make a corresponding rule table according to the network security requirements of the management personnel. Therefore, the configuration of the rule table has direct influence on the performance of the firewall and has important significance on the research of the security policy of the firewall. All that the firewall can work normally is based on the implementation of filtering rules, but it cannot meet the requirement of establishing fine rules and cannot analyze data in the high-level protocol. Each connection applying the gatekeeper firewall must be established on a proxy process created for it with a complex set of protocol analysis mechanisms, which can result in a data delay phenomenon. Although the state detection firewall inherits the advantages of the packet filtering firewall and the application gateway firewall and overcomes the disadvantages of the packet filtering firewall and the application gateway firewall, the state detection firewall is only used for detecting the third layer information of the data packet and cannot thoroughly identify a large amount of junk mails, advertisements, trojan programs and the like in the data packet.
The DNS request data stream source IP anomaly detection algorithm based on the relative density specifically comprises the following steps: when the Domain Name System (DNS) request source IP abnormity detection is carried out by using the relative density, each source IP is firstly regarded as a user/asset, and then the characteristics of the source IP need to be described. Taking the real DNS request data in the campus as an example, 9 features are extracted to describe the source IP by analyzing the DNS request traffic in the hour of 11, 22, 21 in 2015. The data contains 200 ten thousand DNS requests, involving 3148 source IPs, each with its own sequence number. The extracted 9 features are respectively: the query times of a single source IP, the query peak value in unit time, the source port information entropy, the information entropy of DNS message header ID, the number of destination IPs, the number of different domain names to be queried, the peak value of domain name types to be queried in unit time, the proportion of malformed data packets, the proportion of illegal domain names and the like. All the scores for (source IP) are finally scored by an algorithm of relative density, ranking the anomaly scores from high to low. At present, with the increasing of network bandwidth, a large amount of data continuously burst into the internet, new network applications continuously appear and original network applications are continuously updated, the method is only suitable for detecting the DNS stream, and the method corresponds the source IP to one user/asset, which is not strict, because the IP address of each user or asset is allocated by the DHCP server, different source IP addresses exist at different times of login, and the IP addresses cannot be well used as the identifiers of the users or assets.
Therefore, how to overcome the defects of the prior art is a problem to be solved urgently in the technical field of network anomaly detection at present.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a relative density-based network abnormal IP and port hybrid detection method, which overcomes the following defects:
(1) anomaly detection can only be done for IP or port numbers.
(2) The anomaly detection can be performed only for the source IP, and the anomaly detection cannot be performed for the specific destination IP and the port number.
(3) The source IP is used as the identifier of the user or the asset, and the user and the asset cannot be identified accurately.
(4) The network IP anomaly detection method based on the firewall rules cannot effectively detect the anomaly of the high-level network protocol and the dynamically changed network flow.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a relative density-based network abnormal IP and port hybrid detection method comprises the following steps:
step (1), extracting required fields from a data source;
step (2), using the source MAC address as the identification of a user or an asset, extracting all non-repeated source MAC addresses from the field extracted in step (1);
step (3), under each source MAC address extracted in step (2), extracting all non-repeated (destination IP address, destination port);
step (4), extracting three characteristics of access times, the number of different source IPs and the peak value of the access number in unit time for all (destination IP addresses and destination ports) extracted in step (3);
step (5), standardizing the features extracted in the step (4);
step (6), based on the characteristic value standardized in step (5), calculating k nearest neighbors of each (destination IP address and destination port) by adopting Euclidean distance;
step (7), calculating the relative density of each (destination IP address, destination port), and taking the reciprocal of the relative density as an abnormal score;
and (8) defining an abnormal score threshold, wherein (destination IP address and destination port) higher than the threshold are marked as the abnormal destination IP and destination port of the MAC user or the MAC asset.
Further, in step (1), the required fields preferably include a timestamp, a source MAC address, a source IP address, a destination IP address, a source port number, and a destination port number.
Further, in the step (4), the unit time is preferably 60 seconds.
Further, it is preferable that the normalization process in the step (5) is Z-Score normalization.
Further, in step (6), it is preferable that k be 10
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention carries out mixed anomaly detection on the target IP address and the target port without independent consideration, and has higher accuracy and interpretability;
(2) the abnormal score threshold value, k nearest neighbor number and unit time are dynamic parameters, can be manually adjusted, and have higher flexibility;
(3) and the data source is extracted from the PCAP file through data preprocessing to obtain the characteristic based on the bidirectional flow. Compared with the abnormal detection based on the rules of the firewall, more network information can be obtained from the binary file of the data packet source.
Drawings
FIG. 1 is a flow chart of the method for detecting abnormal IP and port mixing based on relative density according to the present invention;
FIG. 2 is a scatter plot of user access times in an application example;
FIG. 3 is a plot of peak visits by a user per unit time in an example application.
Detailed Description
The present invention will be described in further detail with reference to examples.
It will be appreciated by those skilled in the art that the following examples are illustrative of the invention only and should not be taken as limiting the scope of the invention. The examples do not specify particular techniques or conditions, and are performed according to the techniques or conditions described in the literature in the art or according to the product specifications. The materials or equipment used are not indicated by manufacturers, and all are conventional products available by purchase.
Outlier detection based on relative density is an algorithm for outlier discovery and is often applied in anomaly detection. The algorithm can quantitatively give the outlier score of each point when detecting, and can also process well when the data has different density areas. The main idea is to first obtain the relative density of each point in the data set, then use the reciprocal of the relative density of each point as the outlier score of each point, and consider the higher the score, the more abnormal the point is. By selecting the points with the scores ranked at the top, the outliers in the data set can be obtained.
Let data set S ═ { a ═ a1,a2,…,anEach point a therein being an m-dimensional vector, i.e. (x)1 (a),x2 (a),...,xm (a)). Defining k neighbors of each point a as a set N (a, k) ═ ai1,ai2,…,aikIn which aiIs k points closest to the point a, and aiNot equal to a. The distance between points a and b is defined as the euclidean distance, i.e.:
Figure BDA0003443219400000031
wherein x isi (a)The ith feature representing point a; x is the number ofi (b)The ith feature representing point b;
according to the k neighbors of the point a, the surrounding density of the point a is defined as: the inverse of the average distance of point a from each of its k neighbors, i.e.
Figure BDA0003443219400000032
Where density represents the density of point a, dis (a, b) represents the euclidean distance between point a and point b, and | N (a, k) | represents the size of the set.
When a point is far from its surrounding points, its density is small; conversely, when the distance from a point to its surrounding points is small, the density is large.
When the distribution of data points has different densities in different regions, using only the surrounding density may cause the surrounding density of points in the region with sparse density to be generally lower and more easily determined as outliers; and in a dense area, where the density around a point is generally high, it is not easy to be determined as an outlier, even though in the high density area, a point is already far away from other points in the area.
Concept of relative density: based on the surrounding density of the point a, the relative density of the point a is defined as the ratio of the surrounding density of the point a to the average surrounding density of all points in k's neighborhood, that is:
Figure BDA0003443219400000041
here, relative _ density (a, k) represents the relative density of the point a, and density represents the density of the point a. | N (a, k) | represents the size of the set.
The outlier score for point a is the reciprocal of the relative density of that point. And selecting points with the score larger than a threshold value according to the outlier score of each point, so as to obtain abnormal points in the data set.
As shown in fig. 1, the specific steps are as follows:
step 1: extracting required fields from a data source; the required fields include a timestamp, a source MAC address, a source IP address, a destination IP address, a source port number, and a destination port number;
the data sources are: the CSV file is obtained by data preprocessing in a network flow capturing file (PCAP file), and the data preprocessing method comprises the following steps: dividing a plurality of data packet packets in the PCAP file into a plurality of data flow according to quintuple (source IP address, source port, destination IP address, destination port and transport layer protocol), and then extracting information of each data flow, wherein the information comprises characteristic fields of the data flow, such as a timestamp (timestamp of a first data packet of the data flow), a source MAC address (MAC address of a user or an asset), a source IP address (IP address of the user or the asset), a destination IP address, a source port number (port number of the user or the asset), a destination port number and the like.
Step 2: all non-duplicate source MAC addresses are extracted using the source MAC address as an identification of the user or asset.
And step 3: under each source MAC address, all non-repeating (destination IP address, destination port) pairs are extracted, which is one of the key steps of the algorithm. The destination port and the destination IP address are considered together, the destination port is known to be the identification of the process, the IP address is the identification of the host, and the process has the existing meaning only in the host and cannot be split.
And 4, step 4: for all (destination IP address, destination port), three features (access times, number of different source IPs, peak value of access number in unit time) are extracted.
The access times are as follows: i.e. the number of occurrences of (destination IP address, destination port), is statistically available.
Number of different source IPs: that is, how many different source IPs (destination IP address, destination port) are present, and the multiple source IPs indicate that multiple IPs have access to a target process or application on a target host.
Number of access peaks per unit time: the peak value of the query per unit time may be one of the factors for determining whether or not abnormality occurs. And in a time range, calculating the query times of each (destination IP address, destination port) pair in unit time, and taking the maximum value as the query peak value of the (destination IP address, destination port) pair, wherein the unit time is selected to be 60 s.
And 5: and the characteristics are standardized, and a Z-Score standardization method is adopted, so that the detection accuracy is improved.
Step 6: based on the normalized characteristic values, k nearest neighbors of each point (destination IP address, destination port) are calculated by adopting Euclidean distance, and k is selected to be 10 by the method, namely 10 nearest points around each point are considered.
And 7: the relative density of each (destination IP address, destination port) is calculated according to equation (3), and the reciprocal is taken as the anomaly score, i.e., the lower the relative density, the more likely it is to be an outlier, and the higher the corresponding anomaly score.
And 8: the anomaly score threshold is defined based on empirical or real-world requirements, and the (destination IP address, destination port) above the threshold is labeled as the anomalous destination IP and destination port for this MAC user or asset.
Examples of the applications
Setting the unit time to 60 seconds, setting the neighbor number k to 10 and setting the abnormal score threshold to 100, operating the method of the invention, considering the user with the source MAC address of 30:9c:23:1f: fd:05, observing the result, the number of (destination IP, destination port) visited by the user is 37, and the abnormal score of the vast majority of (destination IP, destination port) is less than 100. If the abnormality score of 3 destination IPs and destination ports exceeds 100, it is determined to be abnormal, as shown in table 1 below.
TABLE 130 Exception List of 9c:23:1f: fd:05 Users
Figure BDA0003443219400000051
Two characteristics of a user with a source MAC address of 30:9c:23:1f: fd:05 are plotted or scattered as shown in fig. 2 through 3 below.
It can be seen that most of the data points have relatively concentrated features and therefore have relatively high relative density, and the abnormal destination IP and destination port data points are obviously far away from the data dense area, have relatively low relative density and relatively high abnormal score, and are determined to be abnormal.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. A relative density-based network abnormal IP and port hybrid detection method is characterized by comprising the following steps:
step (1), extracting required fields from a data source;
step (2), using the source MAC address as the identification of a user or an asset, extracting all non-repeated source MAC addresses from the field extracted in step (1);
step (3), under each source MAC address extracted in step (2), extracting all non-repeated (destination IP address, destination port);
step (4), extracting three characteristics of access times, the number of different source IPs and the peak value of the access number in unit time for all (destination IP addresses and destination ports) extracted in step (3);
step (5), standardizing the features extracted in the step (4);
step (6), based on the characteristic value standardized in step (5), calculating k nearest neighbors of each (destination IP address and destination port) by adopting Euclidean distance;
step (7), calculating the relative density of each (destination IP address, destination port), and taking the reciprocal of the relative density as an abnormal score;
and (8) defining an abnormal score threshold, wherein (destination IP address and destination port) higher than the threshold are marked as the abnormal destination IP and destination port of the MAC user or the MAC asset.
2. The method of claim 1, wherein the method comprises: in step (1), the required fields include a timestamp, a source MAC address, a source IP address, a destination IP address, a source port number, and a destination port number.
3. The method of claim 1, wherein the method comprises: in the step (4), the unit time is 60 s.
4. The method of claim 1, wherein the method comprises: in the step (5), the standardization treatment is Z-Score standardization.
5. The method of claim 1, wherein the method comprises: in step (6), k = 10.
CN202111644457.2A 2021-12-29 2021-12-29 Network abnormal IP and port hybrid detection method based on relative density Pending CN114499987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111644457.2A CN114499987A (en) 2021-12-29 2021-12-29 Network abnormal IP and port hybrid detection method based on relative density

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111644457.2A CN114499987A (en) 2021-12-29 2021-12-29 Network abnormal IP and port hybrid detection method based on relative density

Publications (1)

Publication Number Publication Date
CN114499987A true CN114499987A (en) 2022-05-13

Family

ID=81508502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111644457.2A Pending CN114499987A (en) 2021-12-29 2021-12-29 Network abnormal IP and port hybrid detection method based on relative density

Country Status (1)

Country Link
CN (1) CN114499987A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190190938A1 (en) * 2017-12-15 2019-06-20 Panasonic Intellectual Property Corporation Of America Anomaly detection method, learning method, anomaly detection device, and learning device
CN110149343A (en) * 2019-05-31 2019-08-20 国家计算机网络与信息安全管理中心 A kind of abnormal communications and liaison behavioral value method and system based on stream
CN110191004A (en) * 2019-06-18 2019-08-30 北京搜狐新媒体信息技术有限公司 A kind of port detecting method and system
CN110784383A (en) * 2019-12-05 2020-02-11 南京邮电大学 Shadowclocks proxy network flow detection method, storage medium and terminal
CN112202646A (en) * 2020-12-03 2021-01-08 观脉科技(北京)有限公司 Flow analysis method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190190938A1 (en) * 2017-12-15 2019-06-20 Panasonic Intellectual Property Corporation Of America Anomaly detection method, learning method, anomaly detection device, and learning device
CN110149343A (en) * 2019-05-31 2019-08-20 国家计算机网络与信息安全管理中心 A kind of abnormal communications and liaison behavioral value method and system based on stream
CN110191004A (en) * 2019-06-18 2019-08-30 北京搜狐新媒体信息技术有限公司 A kind of port detecting method and system
CN110784383A (en) * 2019-12-05 2020-02-11 南京邮电大学 Shadowclocks proxy network flow detection method, storage medium and terminal
CN112202646A (en) * 2020-12-03 2021-01-08 观脉科技(北京)有限公司 Flow analysis method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王晓峰等: "《ARM嵌入式系统原理与应用》", 30 April 2019 *
王靖云等: "《 基于相对密度的DNS请求数据流源IP异常检测算法》", 《高技术通讯》 *

Similar Documents

Publication Publication Date Title
US20200344246A1 (en) Apparatus, system and method for identifying and mitigating malicious network threats
US11316878B2 (en) System and method for malware detection
US9762543B2 (en) Using DNS communications to filter domain names
US10356106B2 (en) Detecting anomaly action within a computer network
US9369479B2 (en) Detection of malware beaconing activities
Xu et al. Profiling internet backbone traffic: behavior models and applications
JP6053091B2 (en) Traffic feature information extraction method, traffic feature information extraction device, and traffic feature information extraction program
US8141148B2 (en) Method and system for tracking machines on a network using fuzzy GUID technology
EP1906620A1 (en) Method and apparatus for detecting compromised host computers
EP3223495B1 (en) Detecting an anomalous activity within a computer network
US10257213B2 (en) Extraction criterion determination method, communication monitoring system, extraction criterion determination apparatus and extraction criterion determination program
JP2019523584A (en) Network attack prevention system and method
US20210099414A1 (en) In-line detection of algorithmically generated domains
Ertoz et al. Detection and summarization of novel network attacks using data mining
US11570190B2 (en) Detection of SSL / TLS malware beacons
CN115917513A (en) Automating IOT device identification using statistical payload fingerprinting
Ono et al. A proposal of port scan detection method based on Packet‐In Messages in OpenFlow networks and its evaluation
Nie et al. Intrusion detection using a graphical fingerprint model
CN110912933A (en) Equipment identification method based on passive measurement
CN116527390A (en) Port scan detection
CN114499987A (en) Network abnormal IP and port hybrid detection method based on relative density
Saiyod et al. Improving intrusion detection on snort rules for botnet detection
Goparaju et al. Distributed Denial of Service Attack Classification Using Artificial Neural Networks.
Majed et al. Efficient and Secure Statistical Port Scan Detection Scheme
US20230362176A1 (en) System and method for locating dga compromised ip addresses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220513

RJ01 Rejection of invention patent application after publication