WO2019178968A1 - 网络流量监测方法、装置、计算机设备及存储介质 - Google Patents

网络流量监测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2019178968A1
WO2019178968A1 PCT/CN2018/092654 CN2018092654W WO2019178968A1 WO 2019178968 A1 WO2019178968 A1 WO 2019178968A1 CN 2018092654 W CN2018092654 W CN 2018092654W WO 2019178968 A1 WO2019178968 A1 WO 2019178968A1
Authority
WO
WIPO (PCT)
Prior art keywords
application scenario
network traffic
preset application
normal
traffic
Prior art date
Application number
PCT/CN2018/092654
Other languages
English (en)
French (fr)
Inventor
李洋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019178968A1 publication Critical patent/WO2019178968A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • the present application relates to the field of network security, and in particular, to a network traffic monitoring method, apparatus, computer equipment, and storage medium.
  • Network abnormal traffic bursts will cause network congestion, resulting in packet loss, delay and jitter, resulting in degraded network service quality. Moreover, sudden traffic such network abnormal traffic may also present security risks, such as: DDOS attacks, worms And stealing, etc., will cause great harm to the network and business systems.
  • Common network anomaly traffic monitoring methods typically include extracting "fingerprints" of abnormal traffic for identification or identifying abnormal traffic through machine learning models. The former does not recognize undetected network anomalous traffic; the latter requires complex data mining algorithms to determine. In the cloud security field involving big data operations, existing monitoring solutions are difficult to provide a more efficient and accurate network abnormal traffic monitoring solution.
  • the embodiment of the present application provides a network traffic monitoring method, device, computer device, and storage medium to solve the problem of a more efficient and accurate network abnormal traffic monitoring solution in the cloud security field of big data computing.
  • the embodiment of the present application provides a network traffic monitoring method, including:
  • the actual feature vector corresponding to the preset application scenario is an abnormal traffic set
  • the statistics obtain the corresponding proportion of abnormal traffic of the abnormal traffic set. If the abnormal proportion is greater than the second threshold, the actual network traffic is abnormal network traffic.
  • the embodiment of the present application provides a network traffic monitoring apparatus, including:
  • Obtaining a network traffic module where the actual network traffic is obtained, and the corresponding at least one preset application scenario and the actual feature vector corresponding to the preset application scenario are obtained based on the actual network traffic;
  • Obtaining a feature vector module configured to query a preset normal traffic model library based on the at least one preset application scenario, and obtain a normal feature vector corresponding to each preset application scenario;
  • the corresponding feature vector module is configured to: if the intersection of the actual feature vector and the normal feature vector corresponding to the same preset application scenario is less than the first threshold, the actual feature vector corresponding to the preset application scenario is an abnormal traffic set;
  • the statistical abnormality ratio module is used to collect the corresponding abnormal proportion of the abnormal traffic set. If the abnormal proportion is greater than the second threshold, the actual network traffic is abnormal network traffic.
  • an embodiment of the present application provides a computer device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor implements the following steps when executing the computer readable instructions:
  • the actual feature vector corresponding to the preset application scenario is an abnormal traffic set
  • the statistics obtain the corresponding proportion of abnormal traffic of the abnormal traffic set. If the abnormal proportion is greater than the second threshold, the actual network traffic is abnormal network traffic.
  • embodiments of the present application provide one or more non-volatile readable storage media storing computer readable instructions that are executed by one or more processors such that one or more processors Perform the following steps:
  • the actual feature vector corresponding to the preset application scenario is an abnormal traffic set
  • the statistics obtain the corresponding proportion of abnormal traffic of the abnormal traffic set. If the abnormal proportion is greater than the second threshold, the actual network traffic is abnormal network traffic.
  • FIG. 1 is a flowchart of a network traffic monitoring method in Embodiment 1 of the present application.
  • FIG. 3 is another specific flowchart of a network traffic monitoring method in Embodiment 1 of the present application.
  • FIG. 5 is another specific flowchart of a network traffic monitoring method in Embodiment 1 of the present application.
  • FIG. 6 is another specific flowchart of a network traffic monitoring method in Embodiment 1 of the present application.
  • FIG. 7 is another specific flowchart of a network traffic monitoring method in Embodiment 1 of the present application.
  • FIG. 8 is a schematic block diagram of a network traffic monitoring apparatus in Embodiment 2 of the present application.
  • FIG. 9 is a schematic diagram of a computer device in Embodiment 4 of the present application.
  • Accurate network traffic model can help people design better network protocols, more reasonable network topology, and smarter network monitoring system to provide more efficient QOS (Quality of Service) and ensure efficient network operation. Stable and safe.
  • the network is a complex nonlinear system, and is affected by various complex external factors.
  • Its traffic model (that is, the network traffic model) is also complex and changeable. Most of the existing network traffic models are based on abnormal traffic and are modeled after complex machine learning.
  • the embodiment of the present application proposes a method for performing traffic monitoring on network traffic based on normal traffic modeling, and the method is mainly applied to a field of big data computing with strong timeliness requirements, especially in the field of cloud computing.
  • Traffic monitoring is the management and control of network communication data packets, while optimizing and limiting.
  • the purpose of traffic monitoring is to allow and guarantee efficient transmission of data packets, prohibiting or restricting the transmission of illegal packets.
  • Cloud computing is a configurable pool of computing resources (resources including networks, servers, storage, applications, and services) that provide available, convenient, on-demand network access services. Due to the deepening of cloud computing applications and the growing demand for big data processing, users' requirements for the performance and security of cloud computing have increased.
  • the main body of this proposal is the server that provides the resource sharing pool.
  • FIG. 1 shows a flow chart of a network traffic monitoring method in this embodiment.
  • the network traffic monitoring method is applied to a server in a cloud computing environment.
  • the server can provide different cloud services according to the needs of the client, such as: virtual host, proprietary network and cloud storage.
  • the network traffic monitoring method includes the following steps:
  • the actual network traffic refers to the network traffic collected in real time during the current traffic monitoring process.
  • the network traffic is the amount of data transmitted by the data packet on the network, and the two computers communicate through the network, specifically by sending and receiving data packets.
  • the application scenario is a business model based on the cloud platform and built for different services.
  • the server has preset the application scenario database, and the corresponding service scenario in the application scenario library is added or removed according to the increase or decrease of the service.
  • the service scenario stored in the application scenario library is the preset application scenario.
  • the network traffic corresponding to multiple service scenarios exists in the resource pool of the cloud computing, and is introduced from the resource pool to the upper-layer network.
  • the network traffic corresponding to multiple service scenarios should be differentiated one by one, different service models are established, and the bottom layer corresponding to different services is found.
  • the type of protocol in order to make each logical service network clearer when the cloud computing breaks out, extract different network traffic from the huge data stream.
  • the network traffic corresponding to different service scenarios is different.
  • the network traffic corresponding to different service scenarios of the cloud computing network should be distinguished according to various industry applications, and the feature extraction algorithms are used for management.
  • the feature vector is obtained by distinguishing and managing the network traffic, and corresponding to the feature data in the differentiated application scenario.
  • the feature extraction algorithm refers to realizing the perspective of the internal network traffic and the control of the network resources, and distinguishes the data traffic of the specific user in the preset application scenario, which may be a DPI algorithm (Deep packet inspection, deep packet inspection algorithm). ).
  • all network traffic received by the input port of the cloud platform server is actual network traffic.
  • the actual network traffic may include at least three network traffic required by the preset application scenarios.
  • the three preset application scenarios are:
  • PaaS is a proprietary software runtime environment that often generates different data traffic during system development and debugging.
  • the traffic generated by IaaS belongs to the online storage service, and the traffic generated by each storage channel is differentiated.
  • the server divides the network traffic that belongs to the three preset application scenarios, and then performs a feature extraction algorithm on the network traffic obtained by each preset application scenario to extract the corresponding actual feature vector.
  • the server obtains at least one preset application scenario and the corresponding actual feature vector after the actual network traffic is differentiated and managed, and the huge cloud data is subjected to miniaturization processing, so that the server is based on the preset application scenario and the actual feature vector. Further determining whether the actual network traffic is abnormal traffic reduces the complexity of processing cloud data.
  • the normal flow model inventory is stored in the normal network traffic, and the set database formed by the normal feature vectors corresponding to all application scenarios is used to compare whether the actual network traffic is abnormal.
  • the normal network traffic refers to the transmission speed and quantity of traffic packets in the network when the network is in a secure and stable state.
  • the normal feature vector is the feature data obtained by performing a feature extraction algorithm on normal network traffic.
  • each preset application scenario corresponds to two feature vectors: a normal feature vector and an actual feature vector.
  • the actual feature vector is a feature vector processed by the feature extraction algorithm for the actual network traffic collected in the current real time.
  • the normal feature vector is a feature vector that is pre-acquired and stored in any preset application scenario, and can be used as an indicator for determining whether the actual feature vector is abnormal.
  • the server obtains the actual feature vector corresponding to the current preset application scenario, and extracts the corresponding normal feature vector, so that the server further directly processes the normal feature vector and the actual feature vector in the preset application scenario.
  • the actual feature vector corresponding to the preset application scenario is an abnormal traffic set.
  • the actual feature vector is a normal feature vector.
  • the actual feature vector corresponding to the preset application scenario is an abnormal traffic set.
  • the first threshold is a minimum intersection range that is pre-defined in the spatial distribution to measure the actual feature vector as a normal feature vector.
  • the server compares the intersection of the actual feature vector and the normal feature vector, and the algorithm for detecting the abnormal traffic set is as follows:
  • the intersection of the normal feature vector and the actual feature vector in each preset application scenario can be obtained by a simple algorithm, and the speed at which the server performs abnormality determination on the actual feature vector of multiple preset application scenarios is improved.
  • the statistics obtain the corresponding abnormal proportion of the abnormal traffic set. If the abnormal proportion is greater than the second threshold, the actual network traffic is abnormal network traffic.
  • the anomaly ratio is the percentage of the total number of abnormal traffic sets to the total number of monitored traffic sets.
  • the second threshold is determined based on actual experience and needs to divide the critical traffic set or the critical point of the non-abnormal traffic set.
  • the abnormal proportion of the abnormal traffic set in the total monitored traffic concentration can be obtained. If the abnormal proportion is greater than the predetermined second threshold, the actual network traffic belongs to abnormal network traffic, and the server needs to take further processing measures, such as locking the port that receives the actual network traffic.
  • the network traffic monitoring method, the device, the computer device, and the storage medium provided by the embodiments of the present application are configured to query the preset normal traffic model database based on the actual network traffic corresponding to the actual network traffic, to monitor whether the actual network traffic is used. It is implemented for network abnormal traffic. On the one hand, the actual traffic flow is monitored through the normal traffic model database, and abnormal traffic of the network that has not been discovered can also be detected; on the other hand, by establishing a normal traffic model library, the normal traffic model library can be applied to a large amount of network traffic calculation.
  • the cloud security domain is used to identify network anomalies efficiently and quickly.
  • step S10 the corresponding at least one preset application scenario and the actual feature vector are obtained based on actual network traffic, which specifically includes the following steps:
  • the preset application scenario baseline is used to divide the actual network traffic, and the corresponding preset application scenario is obtained.
  • the application scenario baseline is delineated by the behavior characteristics of the normal network traffic corresponding to the preset application scenario, where the behavior characteristics include network utilization, application response time, protocol distribution, and user bandwidth consumption.
  • Network Utilization Measurements of how much bandwidth is used during a specific time interval, and network utilization can be measured by protocol.
  • Application response time It is mainly used to display the connection status of Web sites in the network, such as which computers in the LAN are on the Internet, or which websites are mainly browsed.
  • Protocol distribution Reports network usage based on the distribution of session layer, transport layer, and application layer protocols.
  • Bandwidth refers to the amount of data that can pass through the link per unit time. This refers to the amount of data used by users when they use the network.
  • the server divides the actual network traffic into a number of preset application scenarios by using the application scenario baseline, so that the process of determining the actual network traffic is more targeted and organized.
  • the feature extraction algorithm corresponding to the preset application scenario is used to perform feature extraction and feature vectorization on the actual network traffic, and obtain the corresponding actual feature vector.
  • the process of feature extraction and feature vectorization of the actual network traffic by using the feature extraction algorithm corresponding to the preset application scenario mainly includes the following steps:
  • Feature extraction is performed on the actual network traffic by using a feature extraction algorithm corresponding to the preset application scenario.
  • the feature extraction algorithm corresponding to the preset application scenario may specifically adopt a DPI algorithm, and the DPI algorithm (Deep packet inspection, deep packet inspection algorithm) may be used to detect an application layer protocol of the data packet to detect and analyze. And discover P2P data streams. DPI can help to realize the perspective of the internal traffic of the network and the control of the network resources, and can distinguish the data flow of the specific user in the preset application scenario.
  • DPI Deep packet inspection, deep packet inspection algorithm
  • the DPI algorithm uses a payload feature library to store load signature strings, and packets that conform to the payload signature string are treated as P2P data streams. Almost every P2P corresponding preset application scenario has its own application layer protocol. By collecting data packets and analyzing packet characteristics, a unique payload signature string can be defined for each P2P application layer protocol.
  • the principle of defining a payload signature string is to select a payload characteristic string that is unique to the protocol and that must appear in the interaction process and has the highest frequency in the actual environment.
  • the feature vectorization is a vector set formed by performing the multi-dimensional feature matrix calculation on the feature obtained in the step (1), and can represent a multi-dimensional feature in the preset application scenario, and the multi-dimensional feature can include an IP address pair, a port number, Protocol type and statistics of TCP connections, etc. It can be understood that the features in different preset application scenarios are different, that is, the feature vectors corresponding to the respective preset application scenarios are different.
  • IP protocol corresponding to the Internet layer application scenario: IP protocol, ICMP protocol, ARP protocol, and RARP protocol.
  • Protocols corresponding to the transport layer application scenario TCP protocol and UDP protocol.
  • Protocols corresponding to the application layer application scenario FTP, Telnet, SMTP, HTTP, RIP, NFS, and DNS.
  • the server calculates services of different layer application scenarios
  • the feature vectors are calculated using different protocols.
  • the effective features obtained in step (1) are brought into the invertible matrix.
  • V is the eigenvector matrix, which is used to convert the matrix, that is, to convert one matrix base into another set of eigenvector-based matrices.
  • An example is given to the process of dimension reduction of effective features through a matrix.
  • the Internet layer application scenario as an example: there are 20 sets of samples of the obtained Internet layer application scenario. Each sample includes four valid feature values corresponding to the four protocols of IP protocol, ICMP protocol, ARP protocol, and RARP protocol. Two basic feature values are extracted from the four valid feature values, so that when the actual network traffic is given next time, the actual network traffic can be determined by the two basic feature values to belong to the Internet layer application scenario.
  • the data columns in R' are arranged according to the size of the corresponding feature values, and the subsequent columns correspond to small feature values, and the influence on the entire data set is relatively small after being removed. Directly remove the next 2 columns, leaving only the first 2 columns, complete the dimensionality reduction to achieve feature vectorization.
  • This dimension reduction method is also called PCA algorithm (Principal Component Analysis Algorithm).
  • the feature extraction algorithm extracts the actual feature vector in the actual network traffic to further determine whether the actual network traffic is abnormal traffic based on the actual feature vector, so that the server can more effectively determine whether the actual network traffic is abnormal traffic.
  • the network traffic monitoring method before the step S10, that is, before the step of acquiring the actual network traffic, the network traffic monitoring method further includes:
  • the current preset application scenario refers to the preset application scenario to which the current network traffic belongs.
  • the application scenario baseline is delineated by the behavior characteristics of the normal network traffic corresponding to the preset application scenario. Specifically, the behavior characteristics include network utilization, application response time, protocol distribution, and user bandwidth consumption.
  • the application scenario baseline can be used to divide the actual network traffic with a large amount of data according to the preset application scenario. It is a necessary condition for monitoring the actual network traffic according to the preset application scenario.
  • step S50 the step of generating an application scenario reference line corresponding to the current preset application scenario, specifically includes the following steps:
  • S51 Collect normal network traffic, where the normal network traffic includes at least one preset application scenario and a normal behavior feature corresponding to the preset application scenario.
  • the at least one preset application scenario is determined, and the normal network traffic in the preset application scenario is collected to obtain normal behavior features corresponding to the preset application scenario, such as normal network utilization and normal application response time. , normal protocol distribution and normal user bandwidth consumption.
  • the preset application scenario is basically described by obtaining a value of the multiple normal behaviors corresponding to the preset application scenario, so that the server establishes a normal traffic model for the preset application scenario.
  • the average value is specifically the arithmetic mean value, and the arithmetic mean value is the quotient of the sum of all the data and the total number of data, and the average value can collectively present the overall state of the variable.
  • the standard deviation is the average of the distances from which the data deviate from the mean. It is a measure of the degree of dispersion of the average of a set of data. A large standard deviation represents the difference between the majority of the values in the data and the mean of the data. Large; a small standard deviation, which means that most of the values in the data are closer to the average of the data, and the standard deviation can be used as a measure of uncertainty.
  • the eigenvalues are averaged and the standard deviation is calculated.
  • the calculation formula of the average value ⁇ is as follows: Where n is the number of feature values, i is 1 to n, and x i is any one of the feature values.
  • the formula for calculating the standard deviation ⁇ is as follows: Where N is the number of eigenvalues, i is 1-N, x i is any of the eigenvalues, and ⁇ is the average of the eigenvalues.
  • the application scenario baseline divides at least two reference line ranges, and each reference line range corresponds to a preset application scenario.
  • the reference line range includes an upper limit value and a lower limit value
  • the upper limit value is a maximum value of the reference line range
  • the lower limit value is a minimum value of the reference line range.
  • the preset application scenario is a state corresponding to the data, and may be divided into different preset application scenarios according to the state in which the data may appear, such as a first preset application scenario, a second preset application scenario, a third preset application scenario, and a fourth Preset application scenarios, etc.
  • obtaining the application scenario baseline according to the average value and the standard deviation specifically includes the following steps:
  • the standard deviation coefficient is a positive number, and may be a positive integer or a positive fraction. By multiplying the standard deviation by the standard deviation coefficient, the corresponding standard deviation product can be obtained. In this embodiment, if the standard deviation coefficient is k, the obtained standard deviation product is k* ⁇ .
  • the upper limit of the baseline range is the maximum value of the baseline range, and the upper limit of the baseline range depends on the mean and standard deviation of the corresponding eigenvalues.
  • the specific calculation method is the sum of the product of the mean and the standard deviation. For example, if the average value of the feature values is ⁇ , the standard deviation is ⁇ , and the standard deviation coefficient is k, the upper limit of the reference line value corresponding to the feature value is ⁇ +k* ⁇ .
  • the upper limit of each baseline range divides the historical baseline value into two baseline ranges, corresponding to different preset application scenarios.
  • the lower limit of the baseline range is the minimum of the baseline range, and the lower limit of the baseline range depends on the mean and standard deviation of the corresponding eigenvalues.
  • the specific calculation method is the difference between the product of the mean and the standard deviation. For example, if the average value of the feature values is ⁇ , the standard deviation is ⁇ , and the standard deviation coefficient is k, the lower limit value of the reference line value corresponding to the feature value is ⁇ -k* ⁇ .
  • the lower limit of each baseline range divides the historical baseline value into two baseline ranges, corresponding to different preset application scenarios.
  • the standard deviation coefficient is k
  • the upper limit of any of the reference line ranges corresponding to the eigenvalue is ⁇ +k* ⁇ .
  • the lower limit value is ⁇ -k* ⁇ . Since the reference line range with a large standard deviation coefficient includes a reference line range with a small standard deviation coefficient, the standard deviation is required to more clearly show the preset application scenarios corresponding to different reference line ranges.
  • the baseline range with a smaller coefficient is removed from the baseline range with a larger standard deviation to determine any of the baseline ranges: [[ ⁇ -k* ⁇ , ⁇ -(k-1)* ⁇ ], [ ⁇ + (k-1)* ⁇ , ⁇ +k* ⁇ ]].
  • each of the reference line ranges corresponds to a preset application scenario, such as a preset application, a second preset application scenario, a third preset application scenario, and a fourth preset application scenario.
  • the reference line range is: [[ ⁇ - ⁇ , ⁇ ], [ ⁇ , ⁇ + ⁇ ]], that is, [ ⁇ - ⁇ , ⁇ + ⁇ ], the reference line range is closest to the eigenvalue corresponding
  • the average value of the preset application scenario corresponding to the baseline range is the first preset application scenario.
  • the reference line range is: [[ ⁇ -2 ⁇ , ⁇ - ⁇ ], [ ⁇ + ⁇ , ⁇ +2 ⁇ ]], and the reference line range is close to the first preset application scenario, and the reference line range is determined.
  • the corresponding preset application scenario is a second preset application scenario.
  • the reference line range is: [[ ⁇ -3 ⁇ , ⁇ -2 ⁇ ], [ ⁇ +2 ⁇ , ⁇ +3 ⁇ ]], and the reference line range is close to the second preset application scenario, and the reference line range is determined.
  • the corresponding preset application scenario is a third preset application scenario, and the preset application scenario outside the reference line range obtained by taking the K value is defined as the fourth preset application scenario.
  • step S50 The method of the step S50 is repeated, and the preset application scenario reference line corresponding to all the preset application scenarios is obtained, so that the server can directly call the preset application scenario baseline and accurately divide the actual network traffic.
  • the server generates a preset application scenario reference line corresponding to all the preset application scenarios in advance, so that the server can directly call the preset application scenario baseline when intercepting a large amount of actual network traffic, and accurately divide the actual network. flow.
  • the network traffic monitoring method further includes:
  • Normal network traffic refers to the speed and quantity of traffic packets in the network when the network is in a secure and stable state.
  • step S70 the step of creating a normal traffic model library includes the following steps:
  • step S71 is to obtain the corresponding preset application scenario based on the actual network traffic.
  • step S71 the corresponding preset application scenario is obtained based on the normal network traffic.
  • step S72 the feature extraction algorithm corresponding to the preset application scenario is used to perform feature extraction and feature vectorization on the normal network traffic, and obtain a corresponding normal feature vector.
  • the feature extraction algorithm extracts the normal feature vectors corresponding to different preset application scenarios as reference values, so that the server can refer to the actual network traffic.
  • step S72 feature extraction and feature vectorization are performed on the normal network traffic to obtain a corresponding normal feature vector, which specifically includes the following steps:
  • step S721 to step S722 in this embodiment are similar to steps S11 to S12 in another embodiment of the present application, and details are not described herein again.
  • the difference between the two embodiments is that in the embodiment, normal network traffic is processed to obtain a normal feature vector, and another embodiment processes the actual network traffic to obtain an actual feature vector.
  • the normal feature vector obtained in this embodiment facilitates the server as a reference value to compare the actual feature vector to determine whether the actual network traffic is an abnormal traffic set.
  • step S40 the corresponding abnormal proportion of the abnormal traffic set is obtained, which specifically includes the following steps:
  • the total number of abnormalities refers to the total number of abnormal traffic sets in the actual network traffic participating in this monitoring.
  • the total number of traffic is the total number of all actual network traffic participating in this monitoring.
  • the total number of initial exceptions and the total number of flows is the initial value given to the total number of exceptions and the total number of flows, which can be set to zero.
  • the actual feature vector corresponding to the preset application scenario is an abnormal traffic set, the total number of abnormalities and the total number of traffic are increased by one.
  • step S42 whether the current preset application scenario in the actual network traffic participating in the current monitoring is an abnormal traffic set, the total traffic volume needs to be increased by 1 as long as a new preset application scenario is monitored.
  • the proportion of abnormalities is the percentage of the total number of abnormalities in the total number of flows.
  • the total number of abnormalities and the total number of flows are initialized, and the two data are updated in real time as the actual network traffic is monitored, so that the current proportion of abnormalities can be obtained simply and quickly.
  • the network traffic monitoring method provided by the embodiment of the present application is to obtain an actual network traffic, and query a preset normal traffic model database based on a preset application scenario corresponding to the actual network traffic, to monitor whether the actual network traffic is abnormal network traffic. .
  • the actual traffic flow can be monitored through the normal traffic model database, and the abnormal traffic of the network can be detected.
  • the normal traffic model library can be established through a simple and efficient algorithm, which is suitable for network traffic calculation. The field of cloud security.
  • the server further divides the actual network traffic into a preset application scenario by using the application scenario reference line, so that the process of determining the actual network traffic is more targeted and organized; and the preset corresponding to all the preset application scenarios is generated in advance.
  • Apply the scenario baseline to enable the server to directly invoke the preset application scenario baseline and accurately divide the actual network traffic.
  • FIG. 8 is a schematic block diagram showing a network traffic monitoring apparatus corresponding to the network traffic monitoring method in Embodiment 1.
  • the network traffic monitoring device includes an acquisition network traffic module 10, an acquisition feature vector module 20, a corresponding feature vector module 30, and a statistical abnormality ratio module 40.
  • the implementation functions of the network traffic module 10, the eigenvector module 20, the corresponding eigenvector module 30, and the statistic abnormality ratio module 40 correspond to the steps corresponding to the network traffic monitoring method in the embodiment. To avoid redundancy, the implementation The examples are not detailed one by one.
  • the network traffic module 10 is configured to obtain actual network traffic, and obtain at least one preset application scenario and an actual feature vector corresponding to the preset application scenario based on the actual network traffic.
  • the feature vector module 20 is configured to query a preset normal traffic model library based on the at least one preset application scenario to obtain a normal feature vector corresponding to each preset application scenario.
  • the corresponding feature vector module 30 is configured to: if the intersection of the actual feature vector and the normal feature vector corresponding to the same preset application scenario is less than the first threshold, the actual feature vector corresponding to the preset application scenario is an abnormal traffic set.
  • the statistical abnormality ratio module 40 is configured to collect the corresponding abnormal proportion of the abnormal traffic set. If the abnormal proportion is greater than the second threshold, the actual network traffic is abnormal network traffic.
  • the obtained feature vector module 20 includes an acquisition application scene unit 21 and a fetch feature vector unit 22.
  • the application scenario unit 21 is configured to divide the actual network traffic by using a preset application scenario reference line based on the actual network traffic, and obtain a corresponding preset application scenario.
  • the feature vector unit 22 is configured to perform feature extraction and feature vectorization on the actual network traffic by using a feature extraction algorithm corresponding to the preset application scenario to obtain a corresponding actual feature vector.
  • the network traffic monitoring device further includes a generating reference line unit 50.
  • the reference line unit 50 is configured to generate an application scenario reference line corresponding to the current preset application scenario.
  • the generating reference line unit 50 includes an acquisition network traffic sub-unit 51, an acquisition average sub-unit 52, and an acquisition reference line sub-unit 53.
  • the network traffic sub-unit 51 is configured to collect normal network traffic, where the normal network traffic includes at least one preset application scenario and a normal behavior feature corresponding to the preset application scenario.
  • the average value sub-unit 52 is configured to calculate all normal behavior features in the same preset application scenario, and obtain corresponding average values and standard deviations.
  • a reference line sub-unit 53 is obtained for acquiring an application scene reference line based on the average value and the standard deviation.
  • the reference line unit 60 is configured to continue to acquire the preset application scenario reference line corresponding to the next preset application scenario, until all the preset application scenario baselines are obtained.
  • the network traffic monitoring device further includes a creation model library unit 70 for creating a normal traffic model library.
  • the creation model library unit 70 includes an acquisition network traffic sub-unit 71, an acquisition feature vector sub-unit 72, and a formation model library sub-unit 73.
  • the network traffic sub-unit 71 is configured to obtain normal network traffic, and divide normal network traffic based on a preset application scenario reference line to obtain a corresponding preset application scenario.
  • a feature vector sub-unit 72 is obtained for performing feature extraction and feature vectorization on normal network traffic to obtain a corresponding normal feature vector.
  • the model library sub-unit 73 is configured to store the preset application scenario and the normal feature vector in a database to form a normal traffic model library.
  • the obtained feature vector sub-unit 72 includes an acquisition feature data sub-unit 721 and an obtained feature vector sub-unit 722.
  • the feature data sub-unit 721 is configured to perform feature extraction on normal network traffic to obtain scene feature data.
  • the feature vector sub-unit 722 is obtained for performing feature vectorization on the scene feature data by using matrix calculation to obtain a corresponding normal feature vector.
  • the statistical abnormality ratio module 40 includes an initializing total unit 41, an abnormal flow set processing unit 42, a total flow adding unit 43, and an abnormality dissipating unit 44.
  • the initializing unit 41 is initialized to initialize the total number of abnormalities and the total number of flows.
  • the abnormal traffic set processing unit 42 is configured to add 1 to the total number of abnormalities and the total number of flows if the actual feature vector corresponding to the preset application scenario is an abnormal traffic set.
  • the total traffic adding unit 43 is configured to add 1 to the total traffic volume if the actual feature vector corresponding to the preset application scenario is not an abnormal traffic set.
  • the abnormality ratio unit 44 is configured to divide the total number of abnormalities by the total number of traffic, and obtain an abnormal proportion.
  • the embodiment provides one or more non-volatile readable storage media having computer readable instructions that, when executed by one or more processors, cause the one or more processors to execute The network traffic monitoring method in Embodiment 1 is implemented. To avoid repetition, details are not described herein again. Alternatively, when the computer readable instructions are executed by one or more processors, causing the one or more processors to perform the functions of implementing the modules/units in the network traffic monitoring in Embodiment 2, in order to avoid duplication, here is not Let me repeat.
  • the computer readable storage medium may include any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only Memory), Random Access Memory (RAM), electrical carrier signals, and telecommunications signals.
  • FIG. 9 is a schematic diagram of a computer device according to an embodiment of the present application.
  • computer device 80 of this embodiment includes a processor 81, a memory 82, and computer readable instructions 83 stored in memory 82 and executable on processor 81.
  • the processor 81 implements the steps of the network traffic monitoring method of Embodiment 1 described above when the computer readable instructions 83 are executed, such as steps S10 to S40 shown in FIG.
  • the processor 81 executes the computer readable instructions 83
  • the functions of the modules in the foregoing device embodiments are implemented, for example, the network traffic module 10, the feature vector module 20, the corresponding feature vector module 30, and the statistical exceptions are shown in FIG. More than the function of module 40.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种网络流量监测方法、装置、计算机设备及存储介质。该网络流量监测方法包括:获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量;基于至少一个预设应用场景查询预设的正常流量模型库,获得与每一预设应用场景对应的正常特征向量;若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集;统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。该方法通过简单高效的方法识别出是否存在异常流量,适用于网络流量运算量较大的云安全领域。

Description

网络流量监测方法、装置、计算机设备及存储介质
本申请以2018年03月22日提交的申请号为201810239414.8,名称为“网络流量监测方法、装置、计算机设备及存储介质”的中国发明申请为基础,并要求其优先权。
技术领域
本申请涉及网络安全领域,尤其涉及一种网络流量监测方法、装置、计算机设备及存储介质。
背景技术
网络异常流量突发会造成网络的拥塞,从而产生丢包、延时和抖动,导致网络服务质量下降;不仅如此,突发流量这种网络异常流量还可能存在安全风险,例如:DDOS攻击、蠕虫和窃密等,会对网络和业务系统造成极大的危害。
常见的网络异常流量监测方法通常包括提取异常流量的“指纹”进行识别或者通过机器学习模型来识别异常流量。前者对未发现过的网络异常流量无法识别;后者则需要通过复杂的数据挖掘算法进行判定。在涉及到大数据运算的云安全领域,现有的监测方案难以提供更为高效、精准的网络异常流量监测方案。
发明内容
本申请实施例提供一种网络流量监测方法、装置、计算机设备及存储介质,以解决在大数据运算的云安全领域,无法提供更为高效、精准的网络异常流量监测方案的问题。
第一方面,本申请实施例提供一种网络流量监测方法,包括:
获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量;
基于至少一个预设应用场景查询预设的正常流量模型库,获得与每一预设应用场景对应的正常特征向量;
若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集;
统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。
第二方面,本申请实施例提供一种网络流量监测装置,包括:
获取网络流量模块,用于获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量;
获得特征向量模块,用于基于至少一个预设应用场景查询预设的正常流量模型库,获 得与每一预设应用场景对应的正常特征向量;
对应特征向量模块,用于若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集;
统计异常占比模块,用于统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。
第三方面,本申请实施例提供一种计算机设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现如下步骤:
获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量;
基于至少一个预设应用场景查询预设的正常流量模型库,获得与每一预设应用场景对应的正常特征向量;
若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集;
统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。
第四方面,本申请实施例提供一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:
获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量;
基于至少一个预设应用场景查询预设的正常流量模型库,获得与每一预设应用场景对应的正常特征向量;
若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集;
统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例1中网络流量监测方法的一流程图;
图2是本申请实施例1中网络流量监测方法的另一具体流程图;
图3是本申请实施例1中网络流量监测方法的另一具体流程图;
图4是本申请实施例1中网络流量监测方法的另一具体流程图;
图5是本申请实施例1中网络流量监测方法的另一具体流程图;
图6是本申请实施例1中网络流量监测方法的另一具体流程图;
图7是本申请实施例1中网络流量监测方法的另一具体流程图;
图8是本申请实施例2中网络流量监测装置的一原理框图;
图9是本申请实施例4中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
精确的网络流量模型可以帮助人们设计更好的网络协议、更合理的网络拓扑结构,更智能的网络监控系统,以提供更高效的QOS(Quality of Service,服务质量),保证网络运行得高效、稳定和安全。网络是复杂的非线性系统,同时又受到各种复杂外界因素的影响,其流量模型(即网络流量模型)也是复杂多变的。现有大多网络流量模型是基于异常流量,且经过复杂的机器学习后进行建模的。本申请实施例提出一种基于正常流量进行建模,对网络流量进行流量监控的方法,该方法主要应用于对于时效性要求较强的大数据运算领域,尤其应用在云计算领域。
流量监控,就是针对网络通信数据包进行管理与控制,同时进行优化与限制。流量监控的目的是允许并保证数据包的高效传输,禁止或限制非法数据包传输。
云计算是一种可配置的计算资源共享池(资源包括网络、服务器、存储、应用软件和服务),可提供可用的、便捷的、按需的网络访问服务。由于云计算应用的不断深入,以及对大数据处理需求的不断扩大,用户对云计算的性能和安全的要求随之增大。本提案的执行主体是提供资源共享池的服务器。
实施例1
图1示出本实施例中网络流量监测方法的流程图。该网络流量监控方法应用于云计算环境中的服务器。该服务器可根据客户端的需求提供不同的云端服务,比如:虚拟主机、专有网络和云存储等。如图1所示,该网络流量监测方法包括如下步骤:
S10.获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量。
实际网络流量是指在本次流量监控过程中实时采集到的网络流量。其中,网络流量就是网络上通过数据包传输的数据量,两台计算机通过网络进行“沟通”,具体是借助发送 与接收数据包来完成的。
应用场景是以云平台为基础,针对不同业务搭建的业务模型。应用于本实施例,服务器已预置应用场景库,并根据业务的增减随时增减应用场景库中相应的业务场景,此处存储在应用场景库中的业务场景即为预设应用场景。云计算的资源池中存在多种业务场景对应的网络流量,从资源池引入到上层网络,应对多种业务场景对应的网络流量进行逐一的区分,建立不同的业务模型,找到不同业务对应的底层协议种类,以便在云计算爆发的时候,使得每个逻辑业务网络更加清晰,从庞大的数据流当中提取不同网络流量。
不同业务场景对应的网络流量是不一样的,应对各种行业应用下,云计算网络的不同业务场景对应的网络流量进行区分,同时对这些网络流量采用特征提取算法进行管理。特征向量就是将网络流量进行区分和管理后得到的,对应区分后的应用场景下的特征数据。其中,特征提取算法是指实现对网络内部流量的透视和对网络资源的控制,分辨出具体用户在预设应用场景下的数据流量,具体可以是DPI算法(Deep packet inspection,深度数据包检测算法)。
举例说明实际网络流量、预设应用场景和实际特征向量三者关系:云平台服务器的输入端口接收到的所有的网络流量为实际网络流量。该实际网络流量可能至少包括三种预设应用场景需要的网络流量,这三种预设应用场景为:
(1)SaaS应用场景:
包括大量的HTTP与HTTPS流量,主要分布于80与443端口。
(2)PaaS应用场景:
PaaS属于对外提供定制的软件运行环境,往往会在系统开发与调试阶段产生不同的数据流量。
(3)IaaS应用场景:
IaaS所产生的流量属于在线存储的服务,每个存储通道还产生的流量区分。
服务器将分别属于上述三种预设应用场景的网络流量进行划分后,再将每种预设应用场景划分得到的网络流量进行特征提取算法来提取对应的实际特征向量。
本步骤中,服务器将实际网络流量经过区分和管理后得到至少一个预设应用场景和对应的实际特征向量,将庞大的云数据经过细微化处理,便于服务器基于该预设应用场景和实际特征向量进一步判定该实际网络流量是否为异常流量,降低处理云数据的复杂性。
S20.基于至少一个预设应用场景查询预设的正常流量模型库,获得与每一预设应用场景对应的正常特征向量。
其中,正常流量模型库存储在正常网络流量的情况下,所有应用场景对应的正常特征向量形成的集合数据库,用以对比实际网络流量是否存在异常。其中,正常网络流量指是在网络处于安全稳定的状态时,网络中的流量数据包的传输速度和数量。其中,正常特征向量是对正常网络流量进行特征提取算法后得到的特征数据。
在实际网络检测环境中,每个预设应用场景对应两个特征向量:正常特征向量和实际 特征向量。本实施例中,实际特征向量是对当前实时采集到的实际网络流量采用特征提取算法进行处理后的特征向量。而正常特征向量是任一预设应用场景下预先采集并存储的特征向量,可用作判断实际特征向量是否存在异常的指标。本步骤中,服务器获取当前预设应用场景对应的实际特征向量,并提取对应的正常特征向量,以利于服务器进一步直接处理该预设应用场景下的正常特征向量和实际特征向量。
S30.若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集。
其中,由于一般同类别的样本具有相似性,也即实际特征向量和正常特征向量的样本在特征空间上如果分布具有聚集性,样本之间的距离较小,则说明实际特征向量为正常特征向量。可以理解地,当实际特征向量的样本和正常特征向量样本之间的距离较大时,该预设应用场景对应的实际特征向量即为异常流量集。
第一阈值就是在空间分布上预先划定用于衡量实际特征向量为正常特征向量的最小交集范围。
进一步地,服务器对比实际特征向量和正常特征向量的交集,用于检测异常流量集的算法说明如下:
Figure PCTCN2018092654-appb-000001
本步骤中,通过简单算法即可获取每个预设应用场景中正常特征向量和实际特征向量两者的交集,提高服务器对多个预设应用场景的实际特征向量进行异常判定的速度。
S40.统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。
其中,异常占比是异常流量集的数量占总监测流量集的总数的百分比。第二阈值是依据实际经验和需要确定的,用以划分异常流量集或者非异常流量集的临界点。
本步骤中,通过统计实际网络流量中的异常流量集的数量,进而可获取异常流量集在总监测流量集中的异常占比。若该异常占比大于既定的第二阈值,则该实际网络流量属于 异常网络流量,需要服务器采取进一步的处理措施,比如锁定接收该实际网络流量的端口等。
本申请实施例提供的网络流量监测方法、装置、计算机设备及存储介质,是通过获取实际网络流量,基于实际网络流量对应的应用场景查询预设的正常流量模型库,用以监测实际网络流量是否为网络异常流量来实现的。一方面,通过正常流量模型库监测实际网络流量,也可检测到未发现过的网络异常流量;另一方面,通过建立正常流量模型库,该正常流量模型库可适用于网络流量运算量较大的云安全领域,用于高效、快速地识别出网络异常。
在一具体实施方式中,如图2所示,步骤S10中,即基于实际网络流量获取对应的至少一个预设应用场景和实际特征向量,具体包括如下步骤:
S11.基于实际网络流量,调用预设的应用场景基准线对实际网络流量进行划分,获取对应的预设应用场景。
其中,应用场景基准线是由预设应用场景对应的正常网络流量的行为特征来划定的,其中,行为特征包括网络利用率、应用响应时间、协议分布和用户带宽消耗。
网络利用率:在特定时间间隔内使用了多少带宽的的测量值,可以通过协议测量网络利用率。
应用响应时间:主要用于显示网络中Web网站的连接情况,如局域网中哪些计算机在上网,或者主要浏览哪些网站。
协议分布:根据会话层、传输层和应用层协议的分布汇报网络的使用情况。
用户带宽消耗:带宽是指单位时间能通过链路的数据量,此处是指用户使用网络时所占用的数据量大小。
进一步地,不同的预设应用场景存在不同的应用场景基准线,当新的实际网络流量不在该应用场景划定的应用场景基准线上时,可以判断此时的实际流量不属于本预设应用场景。
本步骤中,服务器通过采用应用场景基准线将实际网络流量划分出若干预设应用场景,以使判定实际网络流量的过程更加具有针对性和条理性。
S12.基于实际网络流量,采用与预设应用场景相对应的特征提取算法对实际网络流量进行特征提取和特征向量化,获取对应的实际特征向量。
其中,采用与预设应用场景相对应的特征提取算法对实际网络流量进行特征提取和特征向量化的过程主要包括以步骤:
(1)采用与预设应用场景相对应的特征提取算法对实际网络流量进行特征提取。
本实施例中,与预设应用场景相对应的特征提取算法具体可采用DPI算法,即可采用DPI算法(Deep packet inspection,深度数据包检测算法)检测数据包的应用层协议,以检测、解析和发现P2P数据流。DPI可以帮助实现对网络内部流量的透视和对网络资源的控制,可以分辨出具体用户在预设应用场景下的数据流。
DPI算法使用一个载荷特征库存储载荷特征串,符合载荷特征串的数据包即被视为P2P数据流。几乎每种P2P对应的预设应用场景都具有自己的应用层协议,通过数据报文捕获,分析报文特征,则可以为每种P2P应用层协议定义唯一的载荷特征串。定义载荷特征串的原则为:选择该协议特有的,交互过程中必须出现且实际环境中出现频率最高的字段作为协议的载荷特征串。
再从捕获出的P2P数据流中提取该预设应用场景中的有效特征,比如:用户的入站路径、用户的入站页面、用户浏览站点常用路径、每个访问的停留时间和用户的退出页面等。
(2)对提取出的特征进行特征向量化,获取对应的实际特征向量。
其中,特征向量化是将步骤(1)中得到的特征进行多维特征矩阵计算后形成的向量集合,可表示该预设应用场景下的多维特征,该多维特征可包括IP地址对、端口号、协议类型和TCP连接的统计信息等。可以理解地,不同预设应用场景下的特征大都不同,也即各个预设应用场景对应的特征向量是不同的。
进一步地,不同应用场景对应不同的特征向量,以协议类型举例说明:
网际层应用场景对应的协议:IP协议、ICMP协议、ARP协议和RARP协议。
传输层应用场景对应的协议:TCP协议和UDP协议。
应用层应用场景对应的协议:FTP、Telnet、SMTP、HTTP、RIP、NFS和DNS。
根据上述协议,服务器计算不同层应用场景的服务时,使用不同的协议计算得到特征向量。
将步骤(1)得到的有效特征带入可逆矩阵。可逆矩阵可以分解为特征值和特征向量的乘积,即AV=lambaV,其中V是特征向量矩阵,用以将矩阵换基,即将一个矩阵基底转换为以另一组以特征向量为基的矩阵,从而将矩阵中的有效特征进行降维。
举例说明通过矩阵来将有效特征进行降维的过程。以网际层应用场景为例:存在20个获取得到的网际层应用场景的样本的集合,每个样本都包括IP协议、ICMP协议、ARP协议和RARP协议四种协议对应的四个有效特征值。从四个有效特征值中提取出两个基本特征值,以便下一次给出实际网络流量时,即可通过这两个基本特征值判断出该实际网络流量属于网际层应用场景。网际层应用场景的原有效特征有四个存在冗余,减少数据量最直接的方法就是降维。矩阵降维的过程为:把样本的集合赋给一个20行4列的矩阵R,减掉均值并归一化,它的协方差矩阵C=R TR,C是4行4列的矩阵,对C进行特征分解,对角化C=UDU T,其中U是特征向量组成的矩阵,D是特征之组成的对角矩阵,并按由大到小排列。然后,令R’=RU,就实现了样本集在特征向量这组正交基上的投影。R’中的数据列是按照对应特征值的大小排列的,后面的列对应小特征值,去掉以后对整个数据集的影响比较小。直接去掉后面的2列,只保留前2列,就完成了降维从而实现特征向量化。这个降维方法也叫PCA算法(Principal Component Analysis,主成分分析算法)。
本步骤中,通过特征提取算法提取出实际网络流量中的实际特征向量来进一步基于该实际特征向量判定实际网络流量是否为异常流量,可使服务器更加高效地判定实际网络流 量是否为异常流量。
在一具体实施方式中,如图3所示,步骤S10之前,即在获取实际网络流量的步骤之前,网络流量监测方法还包括:
S50.生成当前预设应用场景对应的应用场景基准线。
其中,当前预设应用场景是指当前网络流量所属的预设应用场景。应用场景基准线是由预设应用场景对应的正常网络流量的行为特征来划定的,具体地,行为特征包括网络利用率、应用响应时间、协议分布和用户带宽消耗。
应用场景基准线可将数据量较大的实际网络流量按预设应用场景进行划分,是按预设应用场景对实际网络流量进行监测的必要条件。
在一具体实施方式中,如图4所示,步骤S50中,即生成当前预设应用场景对应的应用场景基准线的步骤,具体还包括如下步骤:
S51.采集正常网络流量,正常网络流量包括至少一个预设应用场景和与预设应用场景相对应的正常行为特征。
本步骤中,首先确定至少一个预设应用场景,通过采集该预设应用场景中的正常网络流量,以获取该预设应用场景对应的正常行为特征,比如:正常网络利用率、正常应用响应时间、正常协议分布和正常用户带宽消耗等。
通过获取预设应用场景对应的多个正常行为特征的值,可基本描述该预设应用场景,利于服务器对该预设应用场景建立正常流量模型。
S52.对同一预设应用场景下的所有正常行为特征进行计算,获取对应的平均值和标准差。
其中,平均值具体为算数平均值,算数平均值为所有数据之和与数据总个数的商值,平均值可以集中呈现变量的整体状态。标准差具体为各数据偏离平均数的距离的平均数,是一组数据平均值分散程度的一种度量,一个较大的标准差,代表数据中大部分数值和数据的平均值之间差异较大;一个较小的标准差,代表数据中大部分数值较接近数据的平均值,标准差可以当作不确定性的一种测量。
在获得正常行为特征的特征值后,对特征值进行平均值和标准差的计算。平均值μ的计算公式如下:
Figure PCTCN2018092654-appb-000002
其中,n为特征值的个数,i的取值为1到n,x i为特征值中的任一项数据。标准差σ的计算公式如下:
Figure PCTCN2018092654-appb-000003
其中,N为特征值的个数,i的取值为1-N,x i为特征值中的任一项数据,μ为特征值的平均值。
S53.基于平均值和标准差,获取应用场景基准线。
本实施例中,应用场景基准线划分至少两个基准线范围,每一基准线范围对应一预设 应用场景。其中,基准线范围包括上限值和下限值,上限值为基准线范围的最大值,下限值为基准线范围的最小值。预设应用场景为数据对应的状态,根据数据可能出现的状态可划分为不同的预设应用场景,比如第一预设应用场景、第二预设应用场景、第三预设应用场景和第四预设应用场景等。
具体地,基于平均值和标准差,获取应用场景基准线具体包括如下步骤:
(1)获取标准差与标准差系数的标准差乘积。
其中,标准差系数为正数,可以为正整数,也可以是正分数。将标准差与标准差系数相乘,即可获取对应的标准差乘积。本实施例中,设标准差系数为k,则获取的标准差乘积为k*σ。
(2)基于平均值和标准差乘积的和值,确定一基准线范围的上限值。
基准线范围的上限值是基准线范围的最大值,基准线范围的上限值依赖于相应的特征值的平均值和标准差,具体计算方法为平均值和标准差乘积之和。例如特征值的平均值是μ、标准差是σ、标准差系数为k,则特征值所对应的基准线值的上限值为μ+k*σ。每一基准线范围的上限值可将历史基准线值划分成两个基准线范围,对应不同的预设应用场景。
(3)基于平均值和标准差乘积的差值,确定一基准线范围的下限值。
基准线范围的下限值是基准线范围的最小值,基准线范围的下限值依赖于相应的特征值的平均值和标准差,具体计算方法为平均值和标准差乘积之差。例如特征值的平均值是μ、标准差是σ、标准差系数为k,则特征值所对应的基准线值的下限值为μ-k*σ。每一基准线范围的下限值可将历史基准线值划分成两个基准线范围,对应不同的预设应用场景。
可以理解地,基准线范围的上限值和下限值的计算过程中,标准差系数取值越大,基准线范围越大。设基于特征值计算出的平均值为μ、标准差为σ、标准差系数为k,则该特征值对应的历史基准线值中任一基准线范围的上限值为μ+k*σ,下限值为μ-k*σ,由于标准差系数较大的基准线范围包含标准差系数较小的基准线范围,为更清楚展示不同基准线范围对应的预设应用场景,需将标准差系数较小的基准线范围从标准差较大的基准线范围中删除,以确定任一基准线范围为:[[μ-k*σ,μ-(k-1)*σ],[μ+(k-1)*σ,μ+k*σ]]。
本实施例中,若每一基准线范围对应一预设应用场景,比如第一预设应用场景、第二预设应用场景、第三预设应用场景和第四预设应用场景等预设应用场景。当k取1时,基准线范围为:[[μ-σ,μ],[μ,μ+σ]],即[μ-σ,μ+σ],该基准线范围最接近于 特征值对应的平均值,确定该基准线范围对应的预设应用场景为第一预设应用场景。当k取2时,基准线范围为:[[μ-2σ,μ-σ],[μ+σ,μ+2σ]],该基准线范围接近第一预设应用场景,确定该基准线范围对应的预设应用场景为第二预设应用场景。当k取3时,基准线范围为:[[μ-3σ,μ-2σ],[μ+2σ,μ+3σ]],该基准线范围接近第二预设应用场景,确定该基准线范围对应的预设应用场景为第三预设应用场景,并定义K取3值获取到的基准线范围以外的预设应用场景为第四预设应用场景。
S60.继续获取下一预设应用场景对应的预设应用场景基准线,直至完成获取所有预设应用场景基准线。
重复步骤S50的方法,可获取所有预设应用场景对应的预设应用场景基准线,以便于服务器可直接调用预设应用场景基准线,且准确地划分实际网络流量。
本实施例中,服务器通过提前生成所有预设应用场景对应的预设应用场景基准线,以便于服务器在截取海量实际网络流量时,可直接调用预设应用场景基准线,且准确地划分实际网络流量。
优选地,在步骤S10之前,即在获取实际网络流量的步骤之前,网络流量监测方法还包括:
S70.创建正常流量模型库。
其中,正常流量模型库存储在正常网络流量的情况下,所有应用场景对应的正常特征向量形成的集合数据库,用以作为对比实际网络流量是否为正常网络流量的判断依据。而正常网络流量指是在网络处于安全稳定的状态时,网络中的流量数据包的传输的速度和数量。
在一具体实施方式中,如图5所示,在步骤S70中,即创建正常流量模型库的步骤,具体包括如下步骤:
S71.获取正常网络流量,基于预设的应用场景基准线划分正常网络流量,以获得对应的预设应用场景。
本步骤类似于另一具体实施方式中的步骤S11,这里不再赘述。本步骤S71和步骤S11的区别在于,步骤S11是基于实际网络流量来获得对应的预设应用场景,本步骤S71中,是基于正常网络流量来获取的对应的预设应用场景。
进一步地,不同的预设应用场景存在不同的应用场景基准线,本步骤仅仅采集位于该预设应用场景划定的应用场景基准线上的正常网络流量,以对属于该预设应用场景的正常网络流量进行建模。
S72.对正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量。
本步骤类似于另一具体实施方式中的步骤S12,这里不再赘述。本步骤S72中,采用与预设应用场景相对应的特征提取算法对正常网络流量进行特征提取和特征向量化,获取对应的正常特征向量。
本步骤中,通过特征提取算法提取出不同的预设应用场景对应的正常特征向量作为参考值,以便于服务器判定实际网络流量时进行参考。
S73.将预设应用场景和正常特征向量关联存储到数据库中,形成正常流量模型库。
可以理解地,将所有预设应用场景和分别对应的正常特征向量关联存储到数据库中,即形成正常流量模型库,以便于服务器判定实际网络流量是否为异常流量时即可进行调用和参考,提高服务器处理实际网络流量的速度。
在一具体实施方式中,如图6所示,在步骤S72中,即对正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量,具体包括如下步骤:
S721.对正常网络流量进行特征提取,获取场景特征数据。
S722.采用矩阵计算对场景特征数据进行特征向量化,获得对应的正常特征向量。
具体地,本实施例中的步骤S721至步骤S722类似于本申请另一实施例中的步骤S11至步骤S12,这里不再赘述。两个实施例的区别在于,本实施例中是对正常网络流量进行处理得到正常特征向量的,另一实施例是对实际网络流量进行处理得到实际特征向量的。本实施例得到的正常特征向量利于服务器作为参考值对比实际特征向量,来判定实际网络流量是否为异常流量集。
在一具体实施方式中,如图7所示,在步骤S40中,即统计获取异常流量集的对应的异常占比,具体包括如下步骤:
S41.初始化异常总数和流量总数。
其中,异常总数是指参与本次监测的实际网络流量中的异常流量集的总数,流量总数是参与本次监测的所有实际网络流量的总数。
初始化异常总数和流量总数就是给异常总数和流量总数赋予初始值,都可以设置为0。
S42.若预设应用场景对应的实际特征向量为异常流量集,则异常总数和流量总数均加1。
可以理解地,无论参与本次监测的实际网络流量中的当前预设应用场景是否为异常流量集,只要监测一个新的预设应用场景就需给流量总数加1。同时,如果该预设应用场景经监测后为异常流量集,则给异常总数加1。
S43.若预设应用场景对应的实际特征向量不为异常流量集,则流量总数加1。
如步骤S42中所描述,无论参与本次监测的实际网络流量中的当前预设应用场景是否为异常流量集,只要监测一个新的预设应用场景就需给流量总数加1。
S44.将异常总数除以流量总数,获取异常占比。
其中,异常占比就是异常总数占流量总数的百分比。
本步骤通过初始化初始化异常总数和流量总数,并随着对实际网络流量进行监测时实时更新这两个数据,可简单快捷地获取当前的异常占比。
本申请实施例提供的网络流量监测方法是通过获取实际网络流量,基于实际网络流量对应的预设应用场景查询预设的正常流量模型库,用以监测实际网络流量是否为网络异常 流量来实现的。一方面,通过正常流量模型库监测实际网络流量,也可检测到未发现过的网络异常流量;另一方面,通过简单高效的算法即可建立正常流量模型库,适用于网络流量运算量较大的云安全领域。
进一步地,服务器还通过采用应用场景基准线将实际网络流量划分出预设应用场景,以使判定实际网络流量的过程更加具有针对性和条理性;通过提前生成所有预设应用场景对应的预设应用场景基准线,以便于服务器在截取海量实际网络流量时,可直接调用预设应用场景基准线,且准确地划分实际网络流量;通过初始化初始化异常总数和流量总数,并随着对实际网络流量进行监测时实时更新这两个数据,可简单快捷地获取当前的异常占比。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
实施例2
图8示出与实施例1中网络流量监测方法一一对应的网络流量监测装置的原理框图。如图8所示,该网络流量监测装置包括获取网络流量模块10、获得特征向量模块20、对应特征向量模块30和统计异常占比模块40。其中,获取网络流量模块10、获得特征向量模块20、对应特征向量模块30和统计异常占比模块40的实现功能与实施例中网络流量监测方法对应的步骤一一对应,为避免赘述,本实施例不一一详述。
获取网络流量模块10,用于获取实际网络流量,基于实际网络流量获取对应的至少一个预设应用场景和与预设应用场景相对应的实际特征向量。
获得特征向量模块20,用于基于至少一个预设应用场景查询预设的正常流量模型库,获得与每一预设应用场景对应的正常特征向量。
对应特征向量模块30,用于若同一预设应用场景对应的实际特征向量和正常特征向量的交集小于第一阈值,则预设应用场景对应的实际特征向量为异常流量集。
统计异常占比模块40,用于统计获取异常流量集的对应的异常占比,若异常占比大于第二阈值,则实际网络流量为异常网络流量。
优选地,该获得特征向量模块20包括获取应用场景单元21和取特征向量单元22。
获取应用场景单元21,用于基于实际网络流量,调用预设的应用场景基准线对实际网络流量进行划分,获取对应的预设应用场景。
获取特征向量单元22,用于基于实际网络流量,采用与预设应用场景相对应的特征提取算法对实际网络流量进行特征提取和特征向量化,获取对应的实际特征向量。
优选地,该网络流量监测装置还包括生成基准线单元50。
生成基准线单元50,用于生成当前预设应用场景对应的应用场景基准线。
优选地,该生成基准线单元50包括采集网络流量子单元51、获取平均值子单元52和获取基准线子单元53。
采集网络流量子单元51,用于采集正常网络流量,正常网络流量包括至少一个预设应 用场景和与预设应用场景相对应的正常行为特征。
获取平均值子单元52,用于对同一预设应用场景下的所有正常行为特征进行计算,获取对应的平均值和标准差。
获取基准线子单元53,用于基于平均值和标准差,获取应用场景基准线。
获取基准线单元60,用于继续获取下一预设应用场景对应的预设应用场景基准线,直至完成获取所有预设应用场景基准线。
优选地,该网络流量监测装置还包括创建模型库单元70,用于创建正常流量模型库。
优选地,创建模型库单元70包括获取网络流量子单元71、获得特征向量子单元72和形成模型库子单元73。
获取网络流量子单元71,用于获取正常网络流量,基于预设的应用场景基准线划分正常网络流量,以获得对应的预设应用场景。
获得特征向量子单元72,用于对正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量。
形成模型库子单元73,用于将预设应用场景和正常特征向量关联存储到数据库中,形成正常流量模型库。
优选地,该获得特征向量子单元72包括获取特征数据子单元721和获得特征向量子单元722。
获取特征数据子单元721,用于对正常网络流量进行特征提取,获取场景特征数据。
获得特征向量子单元722,用于采用矩阵计算对场景特征数据进行特征向量化,获得对应的正常特征向量。
优选地,该统计异常占比模块40包括初始化总数单元41、异常流量集处理单元42、加流量总数单元43和获取异常占比单元44。
初始化总数单元41,用于初始化异常总数和流量总数。
异常流量集处理单元42,用于若预设应用场景对应的实际特征向量为异常流量集,则异常总数和流量总数均加1。
加流量总数单元43,用于若预设应用场景对应的实际特征向量不为异常流量集,则流量总数加1。
获取异常占比单元44,用于将异常总数除以流量总数,获取异常占比。
实施例3
本实施例提供一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行实现实施例1中网络流量监测方法,为避免重复,这里不再赘述。或者,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行实现实施例2中网络流量监测中各模块/单元的功能,为避免重复,这里不再赘述。
可以理解地,计算机可读存储介质可以包括:能够携带所述计算机可读指令代码的任 何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号和电信信号等。
实施例4
图9是本申请一实施例提供的计算机设备的示意图。如图9所示,该实施例的计算机设备80包括:处理器81、存储器82以及存储在存储器82中并可在处理器81上运行的计算机可读指令83。处理器81执行计算机可读指令83时实现上述实施例1中网络流量监测方法的步骤,例如图1所示的步骤S10至S40。或者,处理器81执行计算机可读指令83时实现上述各装置实施例中各模块的功能,例如图8所示获取网络流量模块10、获得特征向量模块20、对应特征向量模块30和统计异常占比模块40的功能。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种网络流量监测方法,其特征在于,包括:
    获取实际网络流量,基于所述实际网络流量获取对应的至少一个预设应用场景和与所述预设应用场景相对应的实际特征向量;
    基于至少一个所述预设应用场景查询预设的正常流量模型库,获得与每一所述预设应用场景对应的正常特征向量;
    若同一预设应用场景对应的所述实际特征向量和所述正常特征向量的交集小于第一阈值,则所述预设应用场景对应的所述实际特征向量为异常流量集;
    统计获取所述异常流量集的对应的异常占比,若所述异常占比大于第二阈值,则所述实际网络流量为异常网络流量。
  2. 如权利要求1所述的网络流量监测方法,其特征在于,所述基于所述实际网络流量获取对应的至少一个预设应用场景和实际特征向量,包括:
    基于所述实际网络流量,调用预设的应用场景基准线对所述实际网络流量进行划分,获取对应的预设应用场景;
    基于所述实际网络流量,采用与所述预设应用场景相对应的特征提取算法对所述实际网络流量进行特征提取和特征向量化,获取对应的实际特征向量。
  3. 如权利要求1所述的网络流量监测方法,其特征在于,在所述获取实际网络流量的步骤之前,所述网络流量监测方法还包括:生成当前所述预设应用场景对应的所述应用场景基准线;
    所述生成当前所述预设应用场景对应的所述应用场景基准线,包括:
    采集正常网络流量,所述正常网络流量包括至少一个预设应用场景和与所述预设应用场景相对应的正常行为特征;
    对同一所述预设应用场景下的所有正常行为特征进行计算,获取对应的平均值和标准差;
    基于所述平均值和所述标准差,获取所述应用场景基准线;
    继续获取下一所述预设应用场景对应的预设应用场景基准线,直至完成获取所有预设应用场景基准线。
  4. 如权利要求1所述的网络流量监测方法,其特征在于,在所述获取实际网络流量的步骤之前,所述网络流量监测方法还包括:创建正常流量模型库;
    所述创建正常流量模型库,包括:
    获取正常网络流量,基于预设的应用场景基准线划分所述正常网络流量,以获得对应的预设应用场景;
    对所述正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量;
    将所述预设应用场景和所述正常特征向量关联存储到数据库中,形成正常流量模型 库。
  5. 如权利要求4所述的网络流量监测方法,其特征在于,所述对所述正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量,包括:
    对所述正常网络流量进行特征提取,获取场景特征数据;
    采用矩阵计算对所述场景特征数据进行特征向量化,获得对应的正常特征向量。
  6. 如权利要求1所述的网络流量监测方法,其特征在于,所述统计获取所述异常流量集的对应的异常占比,包括:
    初始化异常总数和流量总数;
    若所述预设应用场景对应的所述实际特征向量为异常流量集,则所述异常总数和所述流量总数均加1;
    若所述预设应用场景对应的所述实际特征向量不为异常流量集,则所述流量总数加1;
    将所述异常总数除以所述流量总数,获取所述异常占比。
  7. 一种网络流量监测装置,其特征在于,包括:
    获取网络流量模块,用于获取实际网络流量,基于所述实际网络流量获取对应的至少一个预设应用场景和与所述预设应用场景相对应的实际特征向量;
    获得特征向量模块,用于基于至少一个所述预设应用场景查询预设的正常流量模型库,获得与每一所述预设应用场景对应的正常特征向量;
    对应特征向量模块,用于若同一预设应用场景对应的所述实际特征向量和所述正常特征向量的交集小于第一阈值,则所述预设应用场景对应的所述实际特征向量为异常流量集;
    统计异常占比模块,用于统计获取所述异常流量集的对应的异常占比,若所述异常占比大于第二阈值,则所述实际网络流量为异常网络流量。
  8. 如权利要求7所述的网络流量监测装置,其特征在于,所述获得特征向量模块包括:
    获取应用场景单元,用于基于所述实际网络流量,调用预设的应用场景基准线对所述实际网络流量进行划分,获取对应的预设应用场景;
    获取特征向量单元,用于基于所述实际网络流量,采用与所述预设应用场景相对应的特征提取算法对所述实际网络流量进行特征提取和特征向量化,获取对应的实际特征向量。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取实际网络流量,基于所述实际网络流量获取对应的至少一个预设应用场景和与所述预设应用场景相对应的实际特征向量;
    基于至少一个所述预设应用场景查询预设的正常流量模型库,获得与每一所述预设应用场景对应的正常特征向量;
    若同一预设应用场景对应的所述实际特征向量和所述正常特征向量的交集小于第一阈值,则所述预设应用场景对应的所述实际特征向量为异常流量集;
    统计获取所述异常流量集的对应的异常占比,若所述异常占比大于第二阈值,则所述实际网络流量为异常网络流量。
  10. 如权利要求9所述的计算机设备,其特征在于,所述基于所述实际网络流量获取对应的至少一个预设应用场景和实际特征向量,包括:
    基于所述实际网络流量,调用预设的应用场景基准线对所述实际网络流量进行划分,获取对应的预设应用场景;
    基于所述实际网络流量,采用与所述预设应用场景相对应的特征提取算法对所述实际网络流量进行特征提取和特征向量化,获取对应的实际特征向量。
  11. 如权利要求9所述的计算机设备,其特征在于,在所述获取实际网络流量的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:生成当前所述预设应用场景对应的所述应用场景基准线;
    所述生成当前所述预设应用场景对应的所述应用场景基准线,包括:
    采集正常网络流量,所述正常网络流量包括至少一个预设应用场景和与所述预设应用场景相对应的正常行为特征;
    对同一所述预设应用场景下的所有正常行为特征进行计算,获取对应的平均值和标准差;
    基于所述平均值和所述标准差,获取所述应用场景基准线;
    继续获取下一所述预设应用场景对应的预设应用场景基准线,直至完成获取所有预设应用场景基准线。
  12. 如权利要求9所述的计算机设备,其特征在于,在所述获取实际网络流量的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:创建正常流量模型库;
    所述创建正常流量模型库,包括:
    获取正常网络流量,基于预设的应用场景基准线划分所述正常网络流量,以获得对应的预设应用场景;
    对所述正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量;
    将所述预设应用场景和所述正常特征向量关联存储到数据库中,形成正常流量模型库。
  13. 如权利要求12所述的计算机设备,其特征在于,所述对所述正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量,包括:
    对所述正常网络流量进行特征提取,获取场景特征数据;
    采用矩阵计算对所述场景特征数据进行特征向量化,获得对应的正常特征向量。
  14. 如权利要求9所述的计算机设备,其特征在于,所述统计获取所述异常流量集的对应的异常占比,包括:
    初始化异常总数和流量总数;
    若所述预设应用场景对应的所述实际特征向量为异常流量集,则所述异常总数和所述流量总数均加1;
    若所述预设应用场景对应的所述实际特征向量不为异常流量集,则所述流量总数加1;
    将所述异常总数除以所述流量总数,获取所述异常占比。
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取实际网络流量,基于所述实际网络流量获取对应的至少一个预设应用场景和与所述预设应用场景相对应的实际特征向量;
    基于至少一个所述预设应用场景查询预设的正常流量模型库,获得与每一所述预设应用场景对应的正常特征向量;
    若同一预设应用场景对应的所述实际特征向量和所述正常特征向量的交集小于第一阈值,则所述预设应用场景对应的所述实际特征向量为异常流量集;
    统计获取所述异常流量集的对应的异常占比,若所述异常占比大于第二阈值,则所述实际网络流量为异常网络流量。
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述基于所述实际网络流量获取对应的至少一个预设应用场景和实际特征向量,包括:
    基于所述实际网络流量,调用预设的应用场景基准线对所述实际网络流量进行划分,获取对应的预设应用场景;
    基于所述实际网络流量,采用与所述预设应用场景相对应的特征提取算法对所述实际网络流量进行特征提取和特征向量化,获取对应的实际特征向量。
  17. 如权利要求15所述的非易失性可读存储介质,其特征在于,在所述获取实际网络流量的步骤之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:生成当前所述预设应用场景对应的所述应用场景基准线;
    所述生成当前所述预设应用场景对应的所述应用场景基准线,包括:
    采集正常网络流量,所述正常网络流量包括至少一个预设应用场景和与所述预设应用场景相对应的正常行为特征;
    对同一所述预设应用场景下的所有正常行为特征进行计算,获取对应的平均值和标准差;
    基于所述平均值和所述标准差,获取所述应用场景基准线;
    继续获取下一所述预设应用场景对应的预设应用场景基准线,直至完成获取所有预设应用场景基准线。
  18. 如权利要求15所述的非易失性可读存储介质,其特征在于,在所述获取实际网络流量的步骤之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:创建正常流量模型库;
    所述创建正常流量模型库,包括:
    获取正常网络流量,基于预设的应用场景基准线划分所述正常网络流量,以获得对应的预设应用场景;
    对所述正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量;
    将所述预设应用场景和所述正常特征向量关联存储到数据库中,形成正常流量模型库。
  19. 如权利要求18所述的非易失性可读存储介质,其特征在于,所述对所述正常网络流量进行特征提取和特征向量化,获得对应的正常特征向量,包括:
    对所述正常网络流量进行特征提取,获取场景特征数据;
    采用矩阵计算对所述场景特征数据进行特征向量化,获得对应的正常特征向量。
  20. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述统计获取所述异常流量集的对应的异常占比,包括:
    初始化异常总数和流量总数;
    若所述预设应用场景对应的所述实际特征向量为异常流量集,则所述异常总数和所述流量总数均加1;
    若所述预设应用场景对应的所述实际特征向量不为异常流量集,则所述流量总数加1;
    将所述异常总数除以所述流量总数,获取所述异常占比。
PCT/CN2018/092654 2018-03-22 2018-06-25 网络流量监测方法、装置、计算机设备及存储介质 WO2019178968A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810239414.8 2018-03-22
CN201810239414.8A CN108650218B (zh) 2018-03-22 2018-03-22 网络流量监测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019178968A1 true WO2019178968A1 (zh) 2019-09-26

Family

ID=63744586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/092654 WO2019178968A1 (zh) 2018-03-22 2018-06-25 网络流量监测方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN108650218B (zh)
WO (1) WO2019178968A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291226A (zh) * 2020-10-23 2021-01-29 新华三信息安全技术有限公司 一种网络流量的异常检测方法及装置

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109450672B (zh) * 2018-10-22 2020-09-18 网宿科技股份有限公司 一种识别带宽需求突发的方法和装置
CN109951491A (zh) * 2019-03-28 2019-06-28 腾讯科技(深圳)有限公司 网络攻击检测方法、装置、设备及存储介质
CN110445808A (zh) * 2019-08-26 2019-11-12 杭州迪普科技股份有限公司 异常流量攻击防护方法、装置、电子设备
CN111682975B (zh) * 2020-04-24 2023-05-16 视联动力信息技术股份有限公司 网络状态预测方法、装置、电子设备及存储介质
CN112202771B (zh) * 2020-09-29 2022-10-14 中移(杭州)信息技术有限公司 网络流量检测方法、系统、电子设备和存储介质
CN112367292B (zh) * 2020-10-10 2021-09-03 浙江大学 一种基于深度字典学习的加密流量异常检测方法
CN112019574B (zh) * 2020-10-22 2021-01-29 腾讯科技(深圳)有限公司 异常网络数据检测方法、装置、计算机设备和存储介质
CN112380771B (zh) * 2020-11-17 2023-04-07 甘肃省祁连山水源涵养林研究院 一种土壤侵蚀评估方法、装置及服务器
CN112615738B (zh) * 2020-12-09 2023-02-28 四川迅游网络科技股份有限公司 一种基于流量特征的网络加速方法
CN112994978B (zh) * 2021-02-25 2023-01-24 网宿科技股份有限公司 一种网络流量监测方法及装置
CN117061322A (zh) * 2023-09-27 2023-11-14 广东云百科技有限公司 物联网流量池管理方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252482A (zh) * 2008-04-07 2008-08-27 华为技术有限公司 网络流量异常检测方法和装置
CN101651568A (zh) * 2009-07-01 2010-02-17 青岛农业大学 一种网络流量预测和异常检测方法
CN105227548A (zh) * 2015-09-14 2016-01-06 中国人民解放军国防科学技术大学 基于办公局域网稳态模型的异常流量筛选方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2884953B1 (fr) * 2005-04-22 2007-07-06 Thales Sa Procede et dispositif embarque, pour aeronef, d'alerte d'incursion de piste
CN102111312B (zh) * 2011-03-28 2013-05-01 钱叶魁 基于多尺度主成分分析的网络异常检测方法
CN105553998B (zh) * 2015-12-23 2019-02-01 中国电子科技集团公司第三十研究所 一种网络攻击异常检测方法
CN105915532B (zh) * 2016-05-23 2019-01-04 北京网康科技有限公司 一种失陷主机的识别方法及装置
CN107370732B (zh) * 2017-07-14 2021-08-17 成都信息工程大学 基于神经网络和最优推荐的工控系统异常行为发现系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252482A (zh) * 2008-04-07 2008-08-27 华为技术有限公司 网络流量异常检测方法和装置
CN101651568A (zh) * 2009-07-01 2010-02-17 青岛农业大学 一种网络流量预测和异常检测方法
CN105227548A (zh) * 2015-09-14 2016-01-06 中国人民解放军国防科学技术大学 基于办公局域网稳态模型的异常流量筛选方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291226A (zh) * 2020-10-23 2021-01-29 新华三信息安全技术有限公司 一种网络流量的异常检测方法及装置
CN112291226B (zh) * 2020-10-23 2022-05-27 新华三信息安全技术有限公司 一种网络流量的异常检测方法及装置

Also Published As

Publication number Publication date
CN108650218A (zh) 2018-10-12
CN108650218B (zh) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2019178968A1 (zh) 网络流量监测方法、装置、计算机设备及存储介质
US9386028B2 (en) System and method for malware detection using multidimensional feature clustering
Santos et al. Machine learning algorithms to detect DDoS attacks in SDN
US11310162B2 (en) System and method for classifying network traffic
US9900344B2 (en) Identifying a potential DDOS attack using statistical analysis
US20170134401A1 (en) System and method for detecting abnormal traffic behavior using infinite decaying clusters
US8997227B1 (en) Attack traffic signature generation using statistical pattern recognition
Simmross-Wattenberg et al. Anomaly detection in network traffic based on statistical inference and\alpha-stable modeling
US10296739B2 (en) Event correlation based on confidence factor
US10061922B2 (en) System and method for malware detection
US20200169582A1 (en) Identifying a potential ddos attack using statistical analysis
Zeng et al. Flow context and host behavior based shadowsocks’s traffic identification
CA3135360A1 (en) Graph stream mining pipeline for efficient subgraph detection
WO2021133791A1 (en) Method for network traffic analysis
Kiran et al. Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP
Li et al. A general framework of trojan communication detection based on network traces
Zhang et al. Design of a novel network intrusion detection system for drone communications
Barrionuevo et al. An anomaly detection model in a lan using k-nn and high performance computing techniques
Easttom et al. An enhanced view of incidence functions for applying graph theory to modeling network intrusions
Gray et al. High performance network metadata extraction using P4 for ML-based intrusion detection systems
Jabbar et al. BotDetectorFW: an optimized botnet detection framework based on five features-distance measures supported by comparisons of four machine learning classifiers using CICIDS2017 dataset
Fernandes et al. Digital signature to help network management using principal component analysis and K-means clustering
David et al. A parallel approach to pca based malicious activitydetection in distributed honeypot data
Issariyapat et al. Anomaly detection in IP networks with principal component analysis
Saber et al. Implementation and Performance Evaluation of Intrusion Detection Systems under high-speed networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18911220

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18911220

Country of ref document: EP

Kind code of ref document: A1