CN116886453B - Network flow big data analysis method - Google Patents

Network flow big data analysis method Download PDF

Info

Publication number
CN116886453B
CN116886453B CN202311158322.4A CN202311158322A CN116886453B CN 116886453 B CN116886453 B CN 116886453B CN 202311158322 A CN202311158322 A CN 202311158322A CN 116886453 B CN116886453 B CN 116886453B
Authority
CN
China
Prior art keywords
address
abnormal
historical
network data
data packets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311158322.4A
Other languages
Chinese (zh)
Other versions
CN116886453A (en
Inventor
孙琳珂
徐桂彬
张忠奎
张晓奇
阮羚
苏佳文
刘芮言
胡斯玥
夏星明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Central China Technology Development Of Electric Power Co ltd
Original Assignee
Hubei Central China Technology Development Of Electric Power Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Central China Technology Development Of Electric Power Co ltd filed Critical Hubei Central China Technology Development Of Electric Power Co ltd
Priority to CN202311158322.4A priority Critical patent/CN116886453B/en
Publication of CN116886453A publication Critical patent/CN116886453A/en
Application granted granted Critical
Publication of CN116886453B publication Critical patent/CN116886453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention relates to the technical field of data processing, in particular to a network flow big data analysis method, which comprises the following steps: obtaining the average number of historical network data packets and the historical suspected abnormal users according to the historical flow data; calculating the time period length according to the average number of the historical network data packets and the historical suspected abnormal users, and obtaining the number of the network data packets of each IP address according to the time period length and the network data packets of each IP address; according to the time period length and the number of the network data packets of each IP address, calculating the abnormal characteristic value of the network data packets of each IP address; calculating a k distance value according to the abnormal characteristic values of the network data packets of all the IP addresses; converting all IP addresses into data points in a coordinate system, screening abnormal data points in all the data points by taking the k distance value as a parameter of an LOF algorithm, and processing the abnormal IP addresses obtained according to the abnormal data points. The invention enhances the monitoring of the abnormal IP address and improves the monitoring accuracy.

Description

Network flow big data analysis method
Technical Field
The invention relates to the technical field of data processing, in particular to a network flow big data analysis method.
Background
With the progress of technology, the application depth of networks in various industries makes people depend on networks more and more, but the following problems are also particularly prominent, and network security is one of the aspects of which attention and discussion are widely paid. The monitoring of abnormal flow of the network provides important guarantee for safe operation of the network, but the traditional detection method neglects the correlation of certain characteristics among abnormal flow data, so that the detection accuracy and the monitoring efficiency are reduced, and the related detection method needs to be perfected further.
The traditional detection method has the advantages that the monitoring of abnormal flow data neglects multi-characteristic correlation of abnormal flow, and the monitoring efficiency is low.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for analyzing network traffic big data, the method comprising:
acquiring network data packets and historical flow data of each IP address of a target website;
obtaining the average number of historical network data packets and the historical suspected abnormal users according to the historical flow data; calculating the time period length according to the average number of the historical network data packets and the historical suspected abnormal users, and obtaining the number of the network data packets of each IP address according to the time period length and the network data packets of each IP address;
according to the time period length, the number of the network data packets of each IP address and the number of the normal port numbers of the IP addresses, calculating the abnormal characteristic value of the network data packets of each IP address;
calculating a k distance value according to the abnormal characteristic values of the network data packets of all the IP addresses;
converting all IP addresses into data points in a coordinate system, screening abnormal data points in all the data points by taking the k distance value as a parameter of an LOF algorithm, and processing the abnormal IP addresses obtained according to the abnormal data points.
Further, the calculating the time period length comprises the following specific steps:
wherein T represents the length of a time period, N represents the number of historical suspected abnormal users of the target website, N represents the number of users in the historical flow data of the target website,standard deviation of number of historical network data packets sent by all historical suspected abnormal users representing target website,/>And the average value of the time of all the historical network data packets sent by the d-th historical suspected abnormal user of the target website is represented.
Further, the step of obtaining the average number of the historical network data packets and the historical suspected abnormal users comprises the following specific steps:
taking the same IP address in the historical flow data of a target website as a user to obtain all users of the target website; acquiring all historical network data packets sent by each user from historical flow data of a target website; calculating the average value of the number of the historical network data packets sent by all users of the target website, marking the average number of the historical network data packets of the target website, and marking the users with the number of the historical network data packets greater than the average number of the historical network data packets in the historical flow data of the target website as the historical suspected abnormal users of the target website.
Further, the obtaining the number of the network data packets of each IP address includes the following specific steps:
the number of network data packets of each IP address generating communication behavior to the target website in the time period length T is recorded as the number of network data packets of each IP address.
Further, the calculating the abnormal characteristic value of the network data packet of each IP address includes the following specific steps:
wherein Z represents the number of networks of IP addressThe packet's outlier, m, represents the number of network packets for the IP address,the number of communication messages in the ith network data packet representing the IP address, T represents the time period length, D represents the number of normal port numbers of the IP address, L represents the source characteristic value of the IP address, and +_>An exponential function based on a natural constant is represented.
Further, the method for acquiring the source characteristic value of the IP address specifically includes the following steps:
when the IP address appears in the historical flow data of the target website, the source characteristic value L=1 of the IP address; when the IP address does not appear in the historical traffic data of the target website and is a domestic IP, the source characteristic value l=2 of the IP address; when the IP address does not appear in the historical traffic data of the target website and is a non-domestic IP, the source feature value l=3 of the IP address.
Further, the calculating the k distance value comprises the following specific steps:
where k represents the value of the k distance,standard deviation of the number of network packets representing all IP addresses; />Standard deviation of abnormal characteristic value of network data packet representing all IP addresses +.>Represents an exponential function based on a natural constant, M represents the number of all IP addresses that produce communication behavior to the target website, +.>Representing an upward rounding.
Further, the converting all the IP addresses into data points in the coordinate system includes the following specific steps:
and constructing a coordinate system by taking the number of the network data packets of each IP address as an abscissa and the abnormal characteristic value of the network data packets of each IP address as an ordinate, converting each IP address into one data point in the coordinate system, and obtaining the corresponding data point of each IP address in the coordinate system.
Further, the processing of the abnormal IP address obtained according to the abnormal data point comprises the following specific steps:
the IP address corresponding to the abnormal data point is recorded as a possible abnormal IP address, and the activity degree of the possible abnormal IP address is calculated; taking the first 25% of possible abnormal IP addresses with the maximum activity level as abnormal IP addresses;
when the abnormal IP address accesses the target website, starting a security authentication measure to limit the abnormal IP address to access important contents in the target website; and adding the IP address which is judged to be the abnormal IP address twice into a blacklist, and prohibiting the IP address in the blacklist and the target website from generating communication behaviors.
Further, the calculating the activity level of the possibly abnormal IP address comprises the following specific steps:
h represents the level of activity of a potentially anomalous IP address,indicating the number of times the potentially abnormal IP address is judged as the potentially abnormal IP address within one week,/-, for example>Abnormal characteristic value of network data packet indicating possible abnormal IP address on j th day,/-, and>indicating the number of possible anomalous IP addresses on day j,/>An exponential function based on a natural constant is represented.
The technical scheme of the invention has the beneficial effects that: aiming at the problems that the traditional detection method neglects multi-characteristic correlation of abnormal flow and low monitoring efficiency in monitoring abnormal flow data, the method obtains the time period length through historical flow data, calculates the abnormal characteristic value of the network data packet of each IP address according to the time period length, the number of the network data packet of each IP address and the number of the normal port number of the IP address through the multi-characteristic correlation of the abnormal flow, takes the number of the network data packet of each IP address as an abscissa and the abnormal characteristic value of the network data packet of each IP address as an ordinate, carries out abnormal data point detection on data points in a constructed coordinate system, further obtains the abnormal IP address, strengthens the monitoring of the abnormal IP address and improves the monitoring accuracy.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for analyzing network traffic big data according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to specific embodiments, structures, features and effects of a network traffic big data analysis method according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the network traffic big data analysis method provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a method flowchart of a data transmission module of a network traffic big data analysis method according to an embodiment of the invention is shown, where the method includes:
s001, obtaining the network data packet of each IP address of the target website.
Specifically, capturing each IP address generating communication behavior for a target website and a network data packet of each IP address by using a network packet capturing tool; filtering the network data packets of each IP address based on each IP address: deleting other irrelevant flow data in the network data packet, only reserving the flow data related to each IP address in the network data packet of each IP address, and reducing the data scale of the network data packet; and processing the network data packet of each IP address, including format conversion, data decoding, protocol analysis and other operations, and obtaining the source IP address source, the data packet content, the communication message number and other information of each IP address.
So far, the network data packet of each IP address of the target website is obtained.
S002, calculating the time period length according to the average number of the historical network data packets and the historical suspected abnormal users, and obtaining the number of the network data packets of each IP address according to the time period length; according to the time period length, the number of the network data packets of each IP address and the number of the normal port numbers of the IP addresses, calculating the abnormal characteristic value of the network data packets of each IP address; and calculating a k distance value according to the abnormal characteristic value of the network data packet of the IP address.
It should be noted that, there is a close relationship between the traffic data and the IP address of the website, for the website, the IP address is an important basis for determining the identity of the visitor and authorizing access, in addition, different IP addresses generate different traffic data according to their communication behaviors, one communication behavior involves the sending end IP splitting the data into multiple data packets, and sending the multiple data packets to the receiving end through the network, where each data packet is an independent data unit, and includes a part of sending data and corresponding control information. The characteristic of multiple dimensions of the flow data is analyzed, namely, the data packets forming the flow data are analyzed, so that the data points corresponding to each IP address can be obtained by constructing a coordinate system by taking the number of the generated data packets in a time period as an abscissa and the degree of abnormality of the data packets as an ordinate, and abnormal data point detection is carried out on the data points, and the detected abnormal data points are abnormal IP addresses.
1. And calculating the time period length according to the average number of the historical network data packets and the historical suspected abnormal users, and obtaining the number of the network data packets of each IP address according to the time period length.
It should be noted that, the time period length for analyzing the IP address communication traffic needs to be defined first, where the time period length is related to the historical traffic data of the website user, the average number of packets of the historical traffic data of the website user is obtained first when the quantization is performed, the average transmission time required for transmitting the average number of packets is obtained from the average number of packets, and the distribution of the number of packets reflected according to the user behavior is used as the definition of the coefficient influencing the time period length.
Specifically, an IP address and a network data packet thereof in the historical data of the target website stored in the big data storage system are obtained and recorded as the historical flow data of the target website; taking the same IP address in the historical flow data of a target website as a user to obtain all users of the target website; acquiring all historical network data packets sent by each user from historical flow data of a target website; calculating the average value of the number of the historical network data packets sent by all users of the target website, marking the average number of the historical network data packets of the target website, and marking the users with the number of the historical network data packets greater than the average number of the historical network data packets in the historical flow data of the target website as the historical suspected abnormal users of the target website; according to the average number of historical network data packets and the length of a time period calculated by a historical suspected abnormal user, a specific calculation formula is as follows:
wherein T represents the length of a time period, N represents the number of historical suspected abnormal users of the target website, N represents the number of users in the historical flow data of the target website,standard deviation of number of historical network data packets sent by all historical suspected abnormal users representing target website,/>And the average value of the time of all the historical network data packets sent by the d-th historical suspected abnormal user of the target website is represented.
Calculation by analyzing historical traffic data of a target websiteThe average value of the time of all the historical network data packets sent by all the historical suspected abnormal users of the target website is represented, and the average sending time required for sending the average number of data packets is deduced according to the average value, so that the length of a time period can be determined; standard deviation of number of historical network data packets transmitted by all historical suspected abnormal users of target website +.>The smaller the number of the historical network data packets sent by all the historical suspected abnormal users is, the more the confidence of the calculated time period length is increased to a certain extent, and the calculated time period length T is closer to the average value of the time of all the historical network data packets sent by all the historical suspected abnormal users of the target website.
Further, the number of network packets per IP address that causes communication behavior to the target web site within the time period length T is recorded as the number of network packets per IP address.
So far, the time period length when analyzing the network data packet of the IP address is obtained.
2. And calculating the abnormal characteristic value of the network data packet of each IP address according to the time period length, the number of the network data packets of each IP address and the number of the normal port numbers of the IP addresses.
When the IP address generates a communication action, the packet corresponding to the traffic data generated in the time slot is analyzed based on the time slot length, thereby quantifying the abnormality degree of the packet; the abnormal degree of the data packet can be obtained from the analyzed data in the data packet, and the analyzed data in the data packet includes information such as the source of the active IP address, the content of the data packet, the number of communication messages and the like; the source IP address information of the communication can be obtained through the source IP address field in the network data packet, the source of the communication can be roughly known through the IP address, and if the communication is a strange IP address, the IP source characteristic item of the IP address needs to be defined as abnormal.
When the user requests the service, the request head has IP information, so the server can easily obtain the user IP from the request head, the abnormal characteristic value of the strange IP address is large for the source of the IP address, the source of the strange IP address can be roughly known through the IP address, and if the IP address is the strange IP address or the non-domestic IP address, the IP source characteristic value of the IP address needs to be defined as a value which can embody the larger abnormal characteristic value.
Specifically, through the server log file of the target website, the log record related to the IP address is screened according to the IP address, the data packet and the port number information are extracted from the screened log, the extracted port numbers are counted, and the number of the normal port numbers of the IP address is calculated.
Further, according to the time period length, the number of network data packets of each IP address and the number of normal port numbers of the IP addresses, calculating the abnormal characteristic value of the network data packets of each IP address, wherein the specific calculation formula is as follows:
wherein Z represents an abnormal characteristic value of the network data packet of the IP address, m represents the number of the network data packets of the IP address,the number of communication messages in the ith network data packet representing the IP address, T represents the time period length, D represents the number of normal port numbers of the IP address, L represents the source characteristic value of the IP address, and +_>An exponential function based on a natural constant is represented.
When the IP address appears in the historical flow data of the target website, the IP address is the common IP for accessing the target website, and the source characteristic value L=1 of the IP address; when the IP address does not appear in the historical flow data of the target website and is domestic IP, the IP address is strange IP, and the source characteristic value L=2 of the IP address; when the IP address does not appear in the historical traffic data of the target website and is a non-domestic IP, the source feature value l=3 of the IP address.
The normal port number is set for identifying different application programs or services in the computer network, and data is allowed to be communicated through a specific port to realize a specific function, so that the smaller the number of normal communication ports of the IP address is, the larger the abnormal characteristic value of the network data packet of the IP address is; the source characteristic value of the IP address is obtained through the source of the IP address, the abnormal characteristic value of the strange IP address is large, the source of the strange IP address can be roughly known through the IP address, if the IP address is a new IP address or is not domestic IP, the IP source characteristic item of the IP address needs to be defined as abnormal, the quantification of the source characteristic value of the IP address is related to the geographic position of the IP address, and because the IP address can provide approximate geographic position information, the geographic position of the IP address of most users is estimated. If a specific IP address or network node is found to generate an abnormally high number of messages or bytes in the same time period, it may indicate that the node has abnormal activity, such as malicious attack, large-scale scanning or abnormal data transmission behavior, so that the larger the average number of communication messages of the IP address, the larger the source characteristic value of the IP address.
Thus, the abnormal characteristic value of the network data packet of each IP address is obtained.
3. And calculating a k distance value according to the abnormal characteristic value of the network data packet of the IP address.
It should be noted that, the number of data packets generated in the time period can be defined as the data point by combining the abnormal characteristic value reflected by the data packet of the IP address with the number of data packets generated by the communication behavior of the IP address in the time period, wherein the number of data packets generated in the time period is the abscissa, and the abnormal characteristic value reflected by the data packet is the ordinate; for data points, the distribution density is different when the data points are distributed through the respective coordinates, and based on analysis of the distribution density, outlier data points, namely abnormal data points in the data points, can be screened. Outliers are screened using the LOF algorithm that distinguishes outliers based on a density distribution, fitting the features defined above that use IP addresses as data points. The LOF algorithm requires that it be specified in the process of performing the algorithmDistance value, which is related to data size, and the number of IP addresses having communication behavior to the website per day is different in the current scenario, so that the number of IP addresses per day is regarded as the size of data to be analyzed per day, and self-adaption can be obtained according to the number>Distance value size.
Specifically, the k distance value is calculated according to the abnormal characteristic value of the network data packet of the IP address, and a specific calculation formula is as follows:
where k represents the value of the k distance,standard deviation of the number of network packets representing all IP addresses;/>standard deviation of abnormal characteristic value of network data packet representing all IP addresses +.>Represents an exponential function based on a natural constant, M represents the number of all IP addresses that produce communication behavior to the target website, +.>Representing an upward rounding.
The standard deviation of a group of data can quantify the discrete degree of the whole group of data, so the standard deviation of the data point abscissa and ordinate set can represent the density degree of the data point, and the standard deviation of the data in the setAnd->The smaller the overall data point, the greater the intensity of the data points; selecting a base k-distance value by taking the open square result of the number of IP addresses that produce communication behavior on the day embodying the data size and rounding up, and influencing the base k-distance value by taking the degree of density as a coefficient, thereby obtaining +.>Distance value.
Thus, a k distance value is obtained.
S003, converting all IP addresses into data points in a coordinate system, screening abnormal data points in all data points by taking a k distance value as a parameter of an LOF algorithm, and processing S on the abnormal IP addresses obtained according to the abnormal data points.
Specifically, the number of network data packets of each IP address is taken as an abscissa, the abnormal characteristic value of the network data packets of each IP address is taken as an ordinate, a coordinate system is constructed, each IP address is converted into one data point in the coordinate system, and the corresponding data point of each IP address in the coordinate system is obtained.
Further, the k distance value is used as a parameter of an LOF algorithm, abnormal data points in all data points are obtained through the LOF algorithm, and in the LOF algorithm, the data points with local abnormal factors larger than 1 are abnormal data points, namely the IP addresses corresponding to the data points with local abnormal factors larger than 1 compared with other IP addresses in the same day are IP addresses with possible abnormal communication behaviors; and recording the IP address corresponding to the abnormal data point as a possible abnormal IP address.
Further, counting possible abnormal IP addresses in all IP addresses generating communication behaviors to the target website every day in a week, and recording related information of the possible abnormal IP addresses, particularly MAC addresses representing physical addresses of network equipment; the activity degree of the possibly abnormal IP address is calculated, and a specific calculation formula is as follows:
h represents the level of activity of a potentially anomalous IP address,indicating the number of times the potentially abnormal IP address is judged as the potentially abnormal IP address within one week,/-, for example>Abnormal characteristic value of network data packet indicating possible abnormal IP address on j th day,/-, and>shows the number of possible abnormal IP addresses on day j, < > in->An exponential function based on a natural constant is represented.
The day on which the IP address is screened as a potentially anomalous IP address, the number of potentially anomalous IP addresses for that dayThe fewer the possible exception IP address itself is, the exception characteristic value +.>The larger the possible abnormal IP address is, and the possible abnormal IP address is always screened as the possible abnormal IP address in one week, the more active the possible abnormal IP address is finally obtained.
Further, the higher the activity level of the possible abnormal IP address, the more the IP address is always filtered as the possible abnormal IP address, and the corresponding measures are required to be taken for the IP address, so the concept of quartile is adopted, and the possible abnormal IP address corresponding to the first 25% of the maximum activity level is used as the abnormal IP address.
It should be noted that, compared with the number of IP addresses generating communication behaviors in the whole scale and the website, the number of IP addresses with abnormal communication behaviors is small, and the communication behaviors and the traffic data of the IP addresses are monitored in an important way.
Further, when the abnormal IP address accesses the target website, a security authentication measure is started, and meanwhile, the abnormal IP address is limited to access part of the content of the target website; if the abnormal communication behavior of the abnormal IP address is obvious, namely, the abnormal IP address is judged to be the abnormal IP address again in the other week after the abnormal IP address is judged to be the abnormal IP address in the one week, the abnormal IP address is directly put into a blacklist, and the abnormal IP address and the target website are forbidden to continue to generate the communication behavior.
When the abnormal IP address accesses the target website, a security authentication measure is started to limit the abnormal IP address to access important contents in the target website, the important contents of the target website are limited by the target website, and the embodiment is not repeated; the embodiment calculates the activity degree of the possible abnormal IP address according to the number of times that the possible abnormal IP address is judged as the possible abnormal IP address in one week, so as to judge whether the possible abnormal IP address is used as the abnormal IP address; therefore, any one IP address can be judged once every week, and the IP addresses which are judged to be abnormal IP addresses twice are added into a blacklist, so that the IP addresses in the blacklist and a target website are forbidden to generate communication behaviors.
Thus, the processing of the abnormal IP address obtained according to the abnormal data point is realized.
Aiming at the problems that the traditional detection method neglects multi-characteristic correlation of abnormal flow and low monitoring efficiency in monitoring abnormal flow data, the method obtains the time period length through historical flow data, calculates the abnormal characteristic value of the network data packet of each IP address according to the time period length, the number of the network data packet of each IP address and the number of the normal port number of the IP address through the multi-characteristic correlation of the abnormal flow, takes the number of the network data packet of each IP address as an abscissa and the abnormal characteristic value of the network data packet of each IP address as an ordinate, carries out abnormal data point detection on data points in a constructed coordinate system, further obtains the abnormal IP address, strengthens the monitoring of the abnormal IP address and improves the monitoring accuracy.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (8)

1. A method for analyzing big data of network traffic, the method comprising:
acquiring network data packets and historical flow data of each IP address of a target website;
obtaining the average number of historical network data packets and the historical suspected abnormal users according to the historical flow data; calculating the time period length according to the average number of the historical network data packets and the historical suspected abnormal users, and obtaining the number of the network data packets of each IP address according to the time period length and the network data packets of each IP address;
according to the time period length, the number of the network data packets of each IP address and the number of the normal port numbers of the IP addresses, calculating the abnormal characteristic value of the network data packets of each IP address;
calculating a k distance value according to the abnormal characteristic values of the network data packets of all the IP addresses;
the method comprises the steps of taking the number of network data packets of each IP address as an abscissa, taking an abnormal characteristic value of the network data packets of each IP address as an ordinate, constructing a coordinate system, converting each IP address into one data point in the coordinate system, obtaining corresponding data points of each IP address in the coordinate system, screening abnormal data points in all the data points by taking a k distance value as a parameter of an LOF algorithm, and processing the abnormal IP addresses obtained according to the abnormal data points;
the calculating of the k distance value comprises the following specific steps:
where k represents the value of the k distance,standard deviation of the number of network packets representing all IP addresses; />Standard deviation of abnormal characteristic value of network data packet representing all IP addresses +.>Represents an exponential function based on a natural constant, M represents the number of all IP addresses that produce communication behavior to the target website, +.>Representing an upward rounding.
2. The method for analyzing network traffic big data according to claim 1, wherein the calculating the time period length comprises the following specific steps:
wherein T represents the length of a time period, N represents the number of historical suspected abnormal users of the target website, N represents the number of users in the historical flow data of the target website,standard deviation of number of historical network data packets sent by all historical suspected abnormal users representing target website,/>And (5) showing the average value of the time of all the historical network data packets sent by the d-th historical suspected abnormal user of the target website.
3. The method for analyzing network traffic big data according to claim 1, wherein the step of obtaining the average number of historical network data packets and the historical suspected abnormal users comprises the following specific steps:
taking the same IP address in the historical flow data of a target website as a user to obtain all users of the target website; acquiring all historical network data packets sent by each user from historical flow data of a target website; calculating the average value of the number of the historical network data packets sent by all users of the target website, marking the average number of the historical network data packets of the target website, and marking the users with the number of the historical network data packets greater than the average number of the historical network data packets in the historical flow data of the target website as the historical suspected abnormal users of the target website.
4. The method for analyzing network traffic big data according to claim 1, wherein the obtaining the number of network packets for each IP address comprises the following specific steps:
the number of network data packets of each IP address generating communication behavior to the target website in the time period length T is recorded as the number of network data packets of each IP address.
5. The method for analyzing network traffic big data according to claim 1, wherein the calculating the abnormal characteristic value of the network packet of each IP address comprises the following specific steps:
wherein Z represents an abnormal characteristic value of the network data packet of the IP address, m represents the number of the network data packets of the IP address,the number of communication messages in the ith network data packet representing the IP address, T represents the time period length, D represents the number of normal port numbers of the IP address, L represents the source characteristic value of the IP address, and +_>An exponential function based on a natural constant is represented.
6. The method for analyzing network traffic big data according to claim 5, wherein the method for acquiring the source characteristic value of the IP address is specifically as follows:
when the IP address appears in the historical flow data of the target website, the source characteristic value L=1 of the IP address; when the IP address does not appear in the historical traffic data of the target website and is a domestic IP, the source characteristic value l=2 of the IP address; when the IP address does not appear in the historical traffic data of the target website and is a non-domestic IP, the source feature value l=3 of the IP address.
7. The method for analyzing network traffic big data according to claim 1, wherein the processing the abnormal IP address obtained according to the abnormal data point comprises the following specific steps:
the IP address corresponding to the abnormal data point is recorded as a possible abnormal IP address, and the activity degree of the possible abnormal IP address is calculated; taking the first 25% of possible abnormal IP addresses with the maximum activity level as abnormal IP addresses;
when the abnormal IP address accesses the target website, starting a security authentication measure to limit the abnormal IP address to access important contents in the target website; and adding the IP address which is judged to be the abnormal IP address twice into a blacklist, and prohibiting the IP address in the blacklist and the target website from generating communication behaviors.
8. The method for analyzing network traffic big data according to claim 7, wherein the step of calculating the activity level of the possible abnormal IP address comprises the following specific steps:
h represents the level of activity of a potentially anomalous IP address,indicating the number of times the potentially abnormal IP address is judged as the potentially abnormal IP address within one week,/-, for example>Abnormal characteristic value of network data packet indicating possible abnormal IP address on j th day,/-, and>indicates the number of possible abnormal IP addresses on day j, < > for>An exponential function based on a natural constant is represented.
CN202311158322.4A 2023-09-08 2023-09-08 Network flow big data analysis method Active CN116886453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311158322.4A CN116886453B (en) 2023-09-08 2023-09-08 Network flow big data analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311158322.4A CN116886453B (en) 2023-09-08 2023-09-08 Network flow big data analysis method

Publications (2)

Publication Number Publication Date
CN116886453A CN116886453A (en) 2023-10-13
CN116886453B true CN116886453B (en) 2023-11-24

Family

ID=88262656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311158322.4A Active CN116886453B (en) 2023-09-08 2023-09-08 Network flow big data analysis method

Country Status (1)

Country Link
CN (1) CN116886453B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107454097A (en) * 2017-08-24 2017-12-08 深圳中兴网信科技有限公司 The detection method of abnormal access, system, computer equipment, readable storage medium storing program for executing
CN109067725A (en) * 2018-07-24 2018-12-21 成都亚信网络安全产业技术研究院有限公司 Network flow abnormal detecting method and device
CN110930057A (en) * 2019-12-06 2020-03-27 国网湖北省电力有限公司电力科学研究院 Quantitative evaluation method for reliability of distribution transformer test result based on LOF algorithm
CN111016720A (en) * 2019-12-23 2020-04-17 深圳供电局有限公司 Attack identification method based on K nearest neighbor algorithm and charging device
CN111738308A (en) * 2020-06-03 2020-10-02 浙江中烟工业有限责任公司 Dynamic threshold detection method for monitoring index based on clustering and semi-supervised learning
CN113325824A (en) * 2021-06-02 2021-08-31 三门核电有限公司 Regulating valve abnormity identification method and system based on threshold monitoring
CN114158080A (en) * 2020-08-17 2022-03-08 中国电信股份有限公司 Monitoring method, monitoring device and computer readable storage medium
CN114760103A (en) * 2022-03-21 2022-07-15 广州大学 Industrial control system abnormity detection system, method, equipment and storage medium
CN116644373A (en) * 2023-07-27 2023-08-25 深圳恒邦新创科技有限公司 Automobile flow data analysis management system based on artificial intelligence
CN116699446A (en) * 2023-05-26 2023-09-05 湖北文理学院 Method, device, equipment and storage medium for rapidly sorting retired batteries

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9866576B2 (en) * 2015-04-17 2018-01-09 Centripetal Networks, Inc. Rule-based network-threat detection
US20230093540A1 (en) * 2021-09-22 2023-03-23 The Toronto-Dominion Bank System and Method for Detecting Anomalous Activity Based on a Data Distribution
US20230153311A1 (en) * 2021-11-12 2023-05-18 Google Llc Anomaly Detection with Local Outlier Factor

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107454097A (en) * 2017-08-24 2017-12-08 深圳中兴网信科技有限公司 The detection method of abnormal access, system, computer equipment, readable storage medium storing program for executing
CN109067725A (en) * 2018-07-24 2018-12-21 成都亚信网络安全产业技术研究院有限公司 Network flow abnormal detecting method and device
CN110930057A (en) * 2019-12-06 2020-03-27 国网湖北省电力有限公司电力科学研究院 Quantitative evaluation method for reliability of distribution transformer test result based on LOF algorithm
CN111016720A (en) * 2019-12-23 2020-04-17 深圳供电局有限公司 Attack identification method based on K nearest neighbor algorithm and charging device
CN111738308A (en) * 2020-06-03 2020-10-02 浙江中烟工业有限责任公司 Dynamic threshold detection method for monitoring index based on clustering and semi-supervised learning
CN114158080A (en) * 2020-08-17 2022-03-08 中国电信股份有限公司 Monitoring method, monitoring device and computer readable storage medium
CN113325824A (en) * 2021-06-02 2021-08-31 三门核电有限公司 Regulating valve abnormity identification method and system based on threshold monitoring
CN114760103A (en) * 2022-03-21 2022-07-15 广州大学 Industrial control system abnormity detection system, method, equipment and storage medium
CN116699446A (en) * 2023-05-26 2023-09-05 湖北文理学院 Method, device, equipment and storage medium for rapidly sorting retired batteries
CN116644373A (en) * 2023-07-27 2023-08-25 深圳恒邦新创科技有限公司 Automobile flow data analysis management system based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于流量信息结构的异常检测;朱应武;杨家海;张金祥;;《软件学报》(第10期);全文 *

Also Published As

Publication number Publication date
CN116886453A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
US8341742B2 (en) Network attack detection devices and methods
CN109474575B (en) DNS tunnel detection method and device
CN104113519B (en) Network attack detecting method and its device
CN108200068B (en) Port monitoring method and device, computer equipment and storage medium
KR100748246B1 (en) Multi-step integrated security monitoring system and method using intrusion detection system log collection engine and traffic statistic generation engine
WO2011113239A1 (en) Flow detection method for domain name system and domain name server thereof
US20110153811A1 (en) System and method for modeling activity patterns of network traffic to detect botnets
CN111427336A (en) Vulnerability scanning method, device and equipment for industrial control system
CN107623685B (en) Method and device for rapidly detecting SYN Flood attack
CN111641658A (en) Request intercepting method, device, equipment and readable storage medium
CN110417747B (en) Method and device for detecting violent cracking behavior
CN113051570B (en) Server access monitoring method and device
CN113489619A (en) Network topology inference method and device based on time series analysis
CN111835681A (en) Large-scale abnormal flow host detection method and device
CN111585837B (en) Internet of things data link monitoring method and device, computer equipment and storage medium
CN112272175A (en) Trojan horse virus detection method based on DNS
US11245712B2 (en) Method and apparatus for generating virtual malicious traffic template for terminal group including device infected with malicious code
CN114301706B (en) Defense method, device and system based on existing threat in target node
CN109005181B (en) Detection method, system and related components for DNS amplification attack
CN114338120A (en) Segment scanning attack detection method, device, medium and electronic equipment
CN116886453B (en) Network flow big data analysis method
CN112788039B (en) DDoS attack identification method, device and storage medium
WO2024027079A1 (en) Domain-name reflection attack detection method and apparatus, and electronic device and storage medium
CN111565196A (en) KNXnet/IP protocol intrusion detection method, device, equipment and medium
CN116074051A (en) Equipment fingerprint generation method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant