CN117439827A - Network flow big data analysis method - Google Patents

Network flow big data analysis method Download PDF

Info

Publication number
CN117439827A
CN117439827A CN202311776561.6A CN202311776561A CN117439827A CN 117439827 A CN117439827 A CN 117439827A CN 202311776561 A CN202311776561 A CN 202311776561A CN 117439827 A CN117439827 A CN 117439827A
Authority
CN
China
Prior art keywords
data point
data
value
amplitude
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311776561.6A
Other languages
Chinese (zh)
Other versions
CN117439827B (en
Inventor
邹翠
罗锦
熊昀暄
周翔
张翠云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pla Army Infantry Academy
Original Assignee
Pla Army Infantry Academy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pla Army Infantry Academy filed Critical Pla Army Infantry Academy
Priority to CN202311776561.6A priority Critical patent/CN117439827B/en
Publication of CN117439827A publication Critical patent/CN117439827A/en
Application granted granted Critical
Publication of CN117439827B publication Critical patent/CN117439827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention relates to the technical field of network traffic data analysis, in particular to a network traffic big data analysis method. Firstly, obtaining flow data to be measured; analyzing the amplitude of data points in the flow data to be detected and the frequency distribution of the amplitude to obtain a discrete value, analyzing the difference of amplitude change conditions among the data points to obtain a mutation value, and combining the two values to obtain an abnormal value; preliminary screening out initial outlier data points based on outliers; furthermore, because the change of the network flow also has numerical mutation and fluctuation conditions during normal use, but has periodic change characteristics, after the characteristic coefficient of the initial abnormal data point is obtained, the characteristic coefficient is combined with the change condition of the moment, so that the final abnormal data point is screened, the final abnormal data point has the characteristic of high abnormal value, the influence of abnormal data point judgment errors caused by normal use is eliminated, and the accuracy of capturing and extracting the abnormal data point is improved.

Description

Network flow big data analysis method
Technical Field
The invention relates to the technical field of network traffic data analysis, in particular to a network traffic big data analysis method.
Background
Network traffic big data refers to large-scale network traffic data generated, collected and stored in a network environment. Along with popularization of the Internet and acceleration of digital transformation; the scale and complexity of network traffic continues to grow, creating a vast amount of traffic data. In the process of analyzing the network big data, detecting abnormal data can help enterprises and organizations monitor the conditions of network performance, security threat and the like.
In general, the monitoring of network traffic is to screen and extract abnormal data, while in the prior art, abnormal outlier data is identified by setting a data interval, so that analysis and monitoring of bandwidth utilization rate are realized; however, the data value set in the mode is relatively fixed, multivariate anomalies cannot be processed, and the abnormal characteristic of the bandwidth utilization rate in the actual scene is not determined by whether abnormal outlier data exists, so that the prior art cannot accurately extract and capture abnormal data points in the specific scene according to specific data change characteristics.
Disclosure of Invention
In order to solve the technical problem that the prior art generally adopts relatively fixed data values to analyze abnormal data, and can not process multivariate anomalies, so that more accurate extraction and capture of abnormal data points in a specific scene can not be realized according to specific data change characteristics, the invention aims to provide a network flow big data analysis method, which adopts the following specific technical scheme:
acquiring time sequence data of the flow bandwidth utilization rate of a target time period, and performing time sequence processing and seasonal operation on the time sequence data to acquire flow data to be detected;
obtaining a discrete value corresponding to each data point according to the amplitude of the data point in the flow data to be measured and the frequency distribution of the amplitude; obtaining a mutation value corresponding to each data point according to the difference of amplitude change conditions among the data points in the flow data to be detected; obtaining an abnormal value of each data point according to the discrete value and the abrupt change value corresponding to each data point, and screening initial abnormal data points according to the abnormal values;
obtaining characteristic coefficients of each initial abnormal data point according to the amplitude values and the amplitude change conditions of the initial abnormal data points; obtaining a comparison data point corresponding to each initial abnormal data point according to the characteristic coefficient of the initial abnormal data point; and periodically analyzing according to the time change condition between the initial abnormal data point and the corresponding contrast data point to obtain a final abnormal data point.
Further, the discrete value acquisition method includes:
acquiring the total number of data points in the flow data to be detected, acquiring the frequency of the occurrence of the amplitude of each data point in the flow data to be detected, and taking the ratio of the frequency of the occurrence of the amplitude of each data point to the total number of data points as the frequency duty ratio corresponding to the data points;
taking the product of the frequency duty ratio corresponding to each data point and the amplitude as the amplitude characteristic value of the data point, and taking the average value of the amplitude characteristic values of all the data points in the flow data to be measured as an amplitude average value;
normalizing the difference between the amplitude of each data point and the average value of the amplitude as the discrete value corresponding to each data point.
Further, the method for obtaining the mutation value comprises the following steps:
acquiring the slope of the amplitude of each data point in the flow data to be detected, and acquiring a slope change value corresponding to each data point according to the slope corresponding to the adjacent data point;
taking two data points closest to each data point as adjacent data points of each data point, and taking the maximum value in slope change values of the two adjacent data points as a slope characteristic value;
and normalizing the difference between the slope change value corresponding to each data point and the slope characteristic value to obtain a mutation value corresponding to each data point.
Further, the method for acquiring the outlier includes:
multiplying a discrete value corresponding to each data point with a preset first weight to obtain a first product;
multiplying the mutation value corresponding to each data point with a preset second weight to obtain a second product;
adding the first product and the second product corresponding to each data point to obtain the outlier of each data point.
Further, the method for obtaining the characteristic coefficient comprises the following steps:
normalizing the amplitude of each initial abnormal data point, and multiplying the normalized amplitude by a preset first parameter to obtain a first coefficient;
normalizing the slope change value corresponding to each initial abnormal data point, and multiplying the slope change value by a preset second parameter to obtain a second coefficient;
and adding the first coefficient and the second coefficient corresponding to each initial abnormal data point to obtain the characteristic coefficient of each initial abnormal data point.
Further, the method for acquiring the comparison data points comprises the following steps:
sequencing all initial abnormal data points based on time sequence to obtain an arrangement sequence, and sequentially taking each initial abnormal data point in the arrangement sequence as a data point to be tested;
setting a preset range according to characteristic coefficients of data points to be detected; and taking the initial abnormal data point with the characteristic coefficient of the initial abnormal data point which is subsequent to the data point to be measured within the preset range as a comparison data point of the data point to be measured.
Further, the periodically analyzing according to the time change condition between the initial abnormal data point and the corresponding contrast data point to obtain a final abnormal data point includes:
calculating each initial abnormal data point and any two corresponding comparison data points based on a periodicity evaluation model to obtain a periodicity index, wherein when the periodicity index is smaller than a preset judgment threshold value, the initial abnormal data point and the two comparison data points participating in calculation are final abnormal data points; when the number of the comparison data points of the initial abnormal data points is smaller than or equal to 1, the initial abnormal data points and the corresponding comparison data points are final abnormal data points.
Further, the periodic assessment model includes:
wherein,representing the periodicity index, ++>Representing the initial outlier data point +.>Time count of->Representing the initial outlier data point +.>Corresponding contrast data point->Time count of->Representing the initial outlier data point +.>Corresponding contrast data point->Time count of->Expressed as natural constant->An exponential function of the base +.>
Further, the method for acquiring the initial abnormal data point comprises the following steps:
and taking the data point with the abnormal value being greater than or equal to a preset abnormal threshold value as the initial abnormal data point.
Further, the preset abnormality threshold is set to 0.7.
The invention has the following beneficial effects:
because the prior art cannot accurately extract and capture abnormal data points according to the change characteristics of the data in a specific scene, the abnormal data points are initially screened by analyzing the change condition of the data points, and then the initially screened abnormal data points are further screened according to the influence possibly generated by the scene, so that the extraction accuracy of the abnormal data points is ensured. Firstly, time sequence data of the flow bandwidth utilization rate in a target time period is obtained, then time sequence processing and seasonal operation are carried out on the time sequence data, flow data to be detected are obtained, interference of the seasonal operation can be eliminated by the flow data to be detected at the moment, and accuracy of a subsequent processing process is improved; then under normal conditions, the numerical value change of the flow data is relatively stable and gentle, and the numerical value of the flow data during abnormality is shown as numerical value mutation and has stronger fluctuation, so that discrete conditions can be obtained by analyzing the amplitude of data points in the flow data to be detected and the frequency distribution of the amplitude, the mutation conditions can be obtained by analyzing the difference of the amplitude change conditions among the data points, the two are combined to obtain abnormal values of the data points, and initial abnormal data points are initially screened out based on the abnormal values of the data points; further, the analysis is performed by combining with the use scene, and as a large amount of starting and running of the system background program can also cause numerical mutation and strong fluctuation in the network flow data in normal use, the system has certain periodic variation characteristics, the characteristic coefficient is obtained based on the amplitude value and the amplitude variation condition of the initial abnormal data point of the analysis, and can be used for representing the data characteristic of each data point; and then screening out the corresponding comparison data point of each initial abnormal data point based on the characteristic coefficient, and analyzing the moment change condition between each initial abnormal data point and the corresponding comparison data point so as to obtain a final abnormal data point, wherein the final abnormal data point has the characteristic of high abnormal value, and the influence of abnormal data point judgment errors possibly caused by normal use is eliminated, so that the unique characteristic is higher, the obtained final abnormal data point is more accurate and representative, and the accuracy of identifying and extracting the abnormal data point is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for analyzing network traffic big data according to an embodiment of the present invention.
Description of the embodiments
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to specific embodiments, structures, features and effects of a network traffic big data analysis method according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the network traffic big data analysis method provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a method flowchart of a network traffic big data analysis method according to an embodiment of the invention is shown, and the method includes the following steps:
step S1: and acquiring time sequence data of the flow bandwidth utilization rate of the target time period, and performing time sequence processing and seasonal operation on the time sequence data to acquire flow data to be detected.
Network traffic big data refers to large-scale network traffic data generated, collected and stored in a network environment. Along with popularization of the Internet and acceleration of digital transformation; the scale and complexity of network traffic are continuously increased, huge traffic data volume is generated, and in the process of analyzing network big data, detection of abnormal data can help enterprises and organizations monitor conditions of aspects such as network performance, security threat and the like. Firstly, time sequence data of the flow bandwidth utilization rate of a target time period is obtained, and the time of the data point is measured on the horizontal axis, wherein the unit can be seconds or minutes; the vertical axis is a specific value of traffic bandwidth utilization in Mbps.
Then, because a seasonal mode exists in the scene, certain cyclic variation of data can be caused, and the normal floating seasonal mode covers up and influences the detection of abnormal data points, the time sequence data is subjected to time sequence decomposition processing and seasonal item removing operation, so that flow data to be detected is obtained; by removing the season term in the time sequence data, the interference caused by the season term can be effectively reduced or reduced, so that the abnormal data is more prominent relative to the normal data, and the accuracy of identifying and capturing outlier data points in the follow-up process can be improved. It should be noted that the time sequence decomposition process and the seasonal operation are all operation processes well known to those skilled in the art, and are not described herein; the specific collection method of the time sequence data of the traffic bandwidth utilization rate can be, for example, web crawlers, acquisition through an API interface, data warehouse, real-time collection and the like, and an implementer can select and adjust according to specific implementation scenes, so that the method is not limited.
So far, the flow data to be measured is obtained by performing time sequence decomposition and seasonal operation on the time sequence data of the utilization rate of the flow bandwidth.
Step S2: obtaining a discrete value corresponding to each data point according to the amplitude value and the frequency distribution of the amplitude value of the data points in the flow data to be measured; obtaining a mutation value corresponding to each data point according to the difference of amplitude change conditions among the data points in the flow data to be measured; and obtaining an abnormal value of each data point according to the discrete value and the abrupt change value corresponding to each data point, and screening initial abnormal data points according to the abnormal values.
In general, the change of the bandwidth utilization rate value of the network traffic is relatively stable and gentle, and when some emergencies, such as the network is attacked or the virus propagates, the bandwidth utilization rate of the network traffic can present an abnormal state, which is embodied as the sudden increase of the value and the shorter duration, so that the data fluctuation is caused, and the abnormal bandwidth utilization rate value caused by the network attack or the blocking event has higher randomness; from this data feature, preliminary evaluation and analysis of abnormal data can be performed.
Based on the above analysis, the abnormal data points are generally discrete in amplitude and high in mutation degree, so that the discrete value corresponding to each data point can be obtained according to the difference of the amplitudes among the data points in the flow data to be measured.
Preferably, the method for acquiring the discrete value in one embodiment of the present invention includes:
acquiring the total number of data points in the flow data to be measured, and recording asCounting the frequency of the occurrence of the amplitude of each data point in the flow data to be measured, and marking the frequency as +.>Taking the ratio of the frequency of occurrence of the amplitude of each data point to the total number of data points as the frequency duty ratio corresponding to the data points, namely +.>
And taking the product of the frequency duty ratio corresponding to each data point and the amplitude as the amplitude characteristic value of the data point, and taking the average value of the amplitude characteristic values of all the data points in the flow data to be measured as an amplitude average value. And finally, normalizing the difference between the amplitude value and the amplitude mean value of each data point to obtain a discrete value corresponding to each data point. In the first placeFor example, the formula model of discrete values may specifically be, for example:
wherein,indicate->Discrete values for data points, +.>Indicate->Amplitude of data points, +.>Represent the firstAmplitude of data points, +.>Represents the total number of data points +.>Indicate->Frequency of occurrence of the amplitude of the data points, +.>Expressed as natural constant->An exponential function of the base.
In the formula model of discrete value, the frequency duty ratio corresponding to the data point is usedAs the weight of the data point amplitude, multiply the corresponding amplitude to obtain the amplitude characteristic value +.>Then, the amplitude mean value is obtained according to the amplitude characteristic values of all the data points>Reducing deviation and influence of abnormal value on final amplitude mean value, so that difference of amplitude value and amplitude mean value of each data point is ∈>And then carrying out normalization operation on the obtained data, wherein when the difference value is smaller, the discrete value obtained according to the formula model of the discrete value is smaller, which means that the amplitude discrete degree of the current data point is lower.
And then, obtaining a mutation value corresponding to each data point according to the difference of amplitude change conditions among the data points in the flow data to be measured.
Preferably, the method for obtaining the mutation value in one embodiment of the present invention includes:
because the slope of the data can reflect the change condition of the data to a certain extent, the slope of the amplitude of each data point in the flow data to be detected is firstly obtained, and then the slope change value corresponding to each data point is obtained according to the slope corresponding to the adjacent data point.
Taking two data points closest to each data point as adjacent data points of each data point, taking the maximum value of slope change values of the two adjacent data points as a slope characteristic value, and recording as
And finally, normalizing the difference between the slope change value of each data point and the corresponding slope characteristic value to be used as a mutation value corresponding to each data point. In the first placeFor example, the formula model of the mutation value may specifically be:
wherein,indicate->Mutation value corresponding to data point,/>Indicate->Slope change value corresponding to data point, +.>Indicate->Slope characteristic value corresponding to data point, +.>Expressed as natural constant->An exponential function of the base.
In the formula model of the abrupt change value, the difference between the slope change value of each data point and the corresponding slope characteristic value is obtainedThe difference value can represent the difference between the slope change value corresponding to the data point and the slope change value corresponding to the adjacent data point, and when the difference value is smaller, the amplitude corresponding to the data point is smaller than the amplitude difference of the adjacent data point, so that the abrupt change value corresponding to the data point obtained according to the equation model of the abrupt change value is smaller and reflectsThe magnitude of the data point is less abrupt. It should be noted that, in this embodiment of the present invention, the slope and the slope change value of the last data point in the flow data to be measured are regarded as being equal to the slope and the slope change value of the previous data point; in other embodiments of the present invention, the last data point in the flow data to be measured is not considered, and then the last data point in the flow data to be measured is not considered any more in the subsequent analysis process.
The discrete value and the abrupt value corresponding to each data point in the flow data to be measured can be obtained based on the discrete value formula model and the abrupt value formula model, and then the discrete value and the abrupt value can be combined to obtain the abnormal value of each data point.
Preferably, the method for acquiring the outlier in one embodiment of the present invention includes:
multiplying a discrete value corresponding to each data point with a preset first weight to obtain a first product; multiplying the mutation value corresponding to each data point with a preset second weight to obtain a second product; and finally, adding the first product corresponding to each data point and the second product to be used as an abnormal value of each data point. In the first placeFor example, the data points are shown as outlier formula models:
wherein,indicate->Abnormal value of data point +_>Indicate->Discrete values for data points, +.>Indicate->Mutation value corresponding to data point,/>Representing a preset first weight, +.>Representing a preset second weight.
In the formula model of the abnormal value, since the preset first weight and the preset second weight are both preset values, the preset first weight and the preset second weight can be fixed values in the formula model, when the discrete value and the abrupt change value corresponding to a certain data point are both larger, the abnormal value of the data point is finally larger, which means that the abnormal degree of the data point is higher and the data point is more likely to be the initial abnormal data point. It should be noted that, in the embodiment of the present invention, the value of the preset first weight is set to 0.6, the value of the preset second weight is set to 0.4, and the setting implementation of the specific numerical value can be adjusted according to the implementation scenario, which is not limited herein.
The abnormal value based acquisition method can acquire the abnormal value of each data point, and then can screen according to the magnitude of the abnormal value of each data point to acquire the initial abnormal data point.
Preferably, the method for acquiring initial abnormal data points in one embodiment of the present invention includes:
based on the above analysis, the greater the outlier, the higher the degree of abnormality of the data point is, and therefore, the preset outlier threshold is set, and the data point having the outlier greater than or equal to the preset outlier threshold is set as the initial outlier data point, and the magnitude of the preset outlier threshold is set to 0.7. In other embodiments of the present invention, the magnitude of the preset anomaly threshold value can be adjusted according to the implementation scenario, which is not limited herein.
Therefore, the initial abnormal data point can be initially screened based on the change characteristics of the data amplitude, and the subsequent analysis can be continuously carried out on the initial abnormal data point, so that the accuracy of acquiring the abnormal data point is improved.
Step S3: obtaining a characteristic coefficient of each initial abnormal data point according to the amplitude value and the amplitude change condition of the initial abnormal data point; obtaining a comparison data point corresponding to each initial abnormal data point according to the characteristic coefficient of the initial abnormal data point; and periodically analyzing according to the time change condition between the initial abnormal data point and the corresponding contrast data point to obtain a final abnormal data point.
Because the flow bandwidth utilization rate value is suddenly changed due to the large quantity of starting and running of the system background program under the normal use condition, but the conditions do not mean that the abnormality occurs, so that the abnormal condition in the flow data to be detected cannot be accurately reflected according to the information contained in the acquired initial abnormal data point, and the analysis of the data change characteristics can show that in some emergency events, such as network attack or virus propagation, the bandwidth utilization rate abnormal state is in higher randomness except the value sudden increase and the duration is shorter, namely the abnormal state does not have a periodic mode; compared with the prior method, the method has the advantages that the bandwidth utilization rate numerical mutation is caused to show a higher periodic mode due to the behaviors such as use habit and the like under the normal use condition; the feature can further judge the initial abnormal data point obtained by the acquisition.
The characteristic coefficient of each initial abnormal data point is obtained firstly based on the amplitude value and the amplitude change condition of the initial abnormal data point.
Preferably, the method for acquiring the characteristic coefficient in one embodiment of the present invention includes:
normalizing the amplitude of each initial abnormal data point, and multiplying the normalized amplitude by a preset first parameter to obtain a first coefficient; and normalizing the slope change value corresponding to each initial abnormal data point, and multiplying the slope change value by a preset second parameter to obtain a second coefficient.
And finally, adding the first coefficient and the second coefficient corresponding to each initial abnormal data point to be used as the characteristic coefficient of each initial abnormal data point. The formula model of the characteristic coefficient is as follows:
wherein,indicate->Characteristic coefficients of the initial outlier data points, +.>Indicate->Amplitude of the initial outlier data point, +.>Indicate->Slope change value corresponding to each initial abnormal data point, < ->Representing a preset first parameter, ">Representing a preset second parameter, ">Representing the normalization function.
In the formula model of the characteristic coefficient, the amplitude change condition of the initial abnormal data point can be represented by the slope change value corresponding to each data point, so that the characteristic coefficient of each initial abnormal data point is obtained based on the amplitude and the amplitude change condition of the initial abnormal data point, and the characteristic of each initial abnormal data point is evaluated, so that the final abnormal data point can be conveniently screened later. It should be noted that, in the embodiment of the present invention, the feature coefficient obtaining method is essentially weighted distribution, and the amplitude and the slope change value corresponding to the initial abnormal data point are considered to be equally important for characterizing the feature of the initial abnormal data point, so that the preset first parameter and the preset second parameter are equal, and both the values are 0.5, and the specific numerical value implementer can adjust according to the implementation scenario, which is not limited herein.
Because periodicity is mainly reflected by time-of-day variations between data points, it is necessary to obtain a comparison data point corresponding to each initial anomaly data point.
Preferably, the method for acquiring the comparison data point in one embodiment of the present invention includes:
firstly, all initial abnormal data points are ordered based on time sequence to obtain an arrangement sequence, and then each initial abnormal data point in the arrangement sequence is used as a data point to be measured.
And then setting a preset range according to the characteristic coefficient of the data point to be measured, and when the characteristic coefficient of the subsequent initial abnormal data point of the data point to be measured is in the preset range, the subsequent initial abnormal data point with the characteristic coefficient in the preset range is the comparison data point of the data point to be measured, and the purpose of screening the comparison data point according to the characteristic coefficient is to enable the initial data point with the same characteristic to be subjected to subsequent periodic comparison, so that the obtained final abnormal data point can be more accurate and more representative. It should be noted that, the preset range is set to be that the characteristic coefficient of the data point to be measured floats up and down by 0.2, that is, if the characteristic coefficient of the data point to be measured is 0.6, the corresponding preset range is 0.4 to 0.8, and the setting implementation of the numerical value of the specific range can be adjusted according to the implementation scenario, which is not limited herein.
Based on the method, the comparison data point of each initial abnormal data point can be obtained, and then periodic analysis can be carried out according to the time change condition between the initial abnormal data point and the corresponding comparison data point, so as to obtain the final abnormal data point.
Preferably, in one embodiment of the present invention, the periodically analyzing according to the time variation condition between the initial abnormal data point and the corresponding contrast data point to obtain the final abnormal data point includes:
calculating each initial abnormal data point and any two corresponding comparison data points based on the periodicity evaluation model to obtain a periodicity index, wherein when the periodicity index is smaller than a preset judgment threshold value, the initial abnormal data point and the two comparison data points participating in calculation are final abnormal data points; when the number of the comparison data points of the initial abnormal data points is less than or equal to 1, the initial abnormal data points and the corresponding comparison data points are final abnormal data points. In the embodiment of the present invention, the preset determination threshold is set to 0.8, and the setting operator in a specific range can adjust according to the implementation scenario, which is not limited herein.
Preferably, the periodic evaluation model in one embodiment of the present invention includes:
wherein,representing the periodicity index, ++>Representing the initial outlier data point +.>Time count of->Representing the initial outlier data point +.>Corresponding contrast data point->Time count of->Representing the initial outlier data point +.>Corresponding contrast data point->Time count of->Expressed as natural constant->An exponential function of the base +.>
In the periodic assessment model, the closer to 1 the ratio of the moment differences between the initial outlier data points is, the moreThe closer to 0, the more periodic the three initial outlier data points are, so that they are mapped and normalized negatively, i.e. +.>The logic relation correction is completed to obtain the periodicity index, and the periodicity index is +.>The larger the value of (2), the higher the probability of judging the three initial abnormal data points as abnormal data points under the normal use condition; conversely, when the periodicity index +.>The smaller the value of (2), the lower the probability that the three initial abnormal data points are abnormal phenomena caused under the normal use condition, namely the higher the probability that the three initial abnormal data points are final abnormal data points.
Based on the step S3, the initial abnormal data point can be further screened to obtain a final abnormal data point, and the final abnormal data point at the moment has the characteristic of discrete mutation and also eliminates periodicity, so that the initial abnormal data point is regarded as being higher in uniqueness and more accurate and representative, and the accurate identification and extraction of the abnormal data of the network flow can be realized.
In summary, after the time sequence data of the traffic bandwidth utilization rate in the target time period is obtained, the time sequence decomposition processing is performed on the time sequence data and the seasonal operation is performed, so that the interference caused by the seasonal operation is effectively reduced; because the value change of the flow data under normal conditions is relatively stable and gentle, and the value of the flow data under abnormal conditions is shown as value mutation and has stronger fluctuation, the discrete value representing the discrete condition can be obtained by analyzing the amplitude of the data points in the flow data to be detected and the frequency distribution of the amplitude, the mutation value representing the mutation condition can be obtained by analyzing the difference of the amplitude change condition among the data points, and the two values are combined to obtain the abnormal value of the data points; then preliminary screening out initial abnormal data points based on the abnormal values; further, when the network flow changes in normal use, a large number of starting and running of the system background program, the use habit of a user and the like can also generate numerical mutation and fluctuation, but the system has certain periodic change characteristics, so that the characteristic coefficient is obtained based on the analysis of the amplitude and the amplitude change condition of the initial abnormal data point, the characteristic coefficient is further combined with the time change condition between the initial abnormal data points, and the final abnormal data point is screened, and the final abnormal data point at the moment has the characteristic of high abnormal value, and the influence of abnormal data point judgment error caused by normal use is eliminated, so that the uniqueness is higher, the obtained final abnormal data point is more accurate and representative, and the accurate identification and extraction of the abnormal data of the network flow are realized.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (10)

1. A method for analyzing big data of network traffic, the method comprising:
acquiring time sequence data of the flow bandwidth utilization rate of a target time period, and performing time sequence processing and seasonal operation on the time sequence data to acquire flow data to be detected;
obtaining a discrete value corresponding to each data point according to the amplitude of the data point in the flow data to be measured and the frequency distribution of the amplitude; obtaining a mutation value corresponding to each data point according to the difference of amplitude change conditions among the data points in the flow data to be detected; obtaining an abnormal value of each data point according to the discrete value and the abrupt change value corresponding to each data point, and screening initial abnormal data points according to the abnormal values;
obtaining characteristic coefficients of each initial abnormal data point according to the amplitude values and the amplitude change conditions of the initial abnormal data points; obtaining a comparison data point corresponding to each initial abnormal data point according to the characteristic coefficient of the initial abnormal data point; and periodically analyzing according to the time change condition between the initial abnormal data point and the corresponding contrast data point to obtain a final abnormal data point.
2. The method for analyzing network traffic big data according to claim 1, wherein the method for acquiring the discrete value comprises:
acquiring the total number of data points in the flow data to be detected, acquiring the frequency of the occurrence of the amplitude of each data point in the flow data to be detected, and taking the ratio of the frequency of the occurrence of the amplitude of each data point to the total number of data points as the frequency duty ratio corresponding to the data points;
taking the product of the frequency duty ratio corresponding to each data point and the amplitude as the amplitude characteristic value of the data point, and taking the average value of the amplitude characteristic values of all the data points in the flow data to be measured as an amplitude average value;
normalizing the difference between the amplitude of each data point and the average value of the amplitude as the discrete value corresponding to each data point.
3. The method for analyzing network traffic big data according to claim 1, wherein the method for acquiring the mutation value comprises:
acquiring the slope of the amplitude of each data point in the flow data to be detected, and acquiring a slope change value corresponding to each data point according to the slope corresponding to the adjacent data point;
taking two data points closest to each data point as adjacent data points of each data point, and taking the maximum value in slope change values of the two adjacent data points as a slope characteristic value;
and normalizing the difference between the slope change value corresponding to each data point and the slope characteristic value to obtain a mutation value corresponding to each data point.
4. The network traffic big data analysis method according to claim 1, wherein the obtaining method of the outlier includes:
multiplying a discrete value corresponding to each data point with a preset first weight to obtain a first product;
multiplying the mutation value corresponding to each data point with a preset second weight to obtain a second product;
adding the first product and the second product corresponding to each data point to obtain the outlier of each data point.
5. A network traffic big data analysis method according to claim 3, wherein the method for obtaining the characteristic coefficient comprises:
normalizing the amplitude of each initial abnormal data point, and multiplying the normalized amplitude by a preset first parameter to obtain a first coefficient;
normalizing the slope change value corresponding to each initial abnormal data point, and multiplying the slope change value by a preset second parameter to obtain a second coefficient;
and adding the first coefficient and the second coefficient corresponding to each initial abnormal data point to obtain the characteristic coefficient of each initial abnormal data point.
6. The method for analyzing network traffic big data according to claim 1, wherein the method for acquiring the comparison data point comprises:
sequencing all initial abnormal data points based on time sequence to obtain an arrangement sequence, and sequentially taking each initial abnormal data point in the arrangement sequence as a data point to be tested;
setting a preset range according to characteristic coefficients of data points to be detected; and taking the initial abnormal data point with the characteristic coefficient of the initial abnormal data point which is subsequent to the data point to be measured within the preset range as a comparison data point of the data point to be measured.
7. The method for analyzing network traffic big data according to claim 1, wherein the periodically analyzing according to the time change between the initial abnormal data point and the corresponding contrast data point to obtain the final abnormal data point comprises:
calculating each initial abnormal data point and any two corresponding comparison data points based on a periodicity evaluation model to obtain a periodicity index, wherein when the periodicity index is smaller than a preset judgment threshold value, the initial abnormal data point and the two comparison data points participating in calculation are final abnormal data points; when the number of the comparison data points of the initial abnormal data points is smaller than or equal to 1, the initial abnormal data points and the corresponding comparison data points are final abnormal data points.
8. The network traffic big data analysis method according to claim 7, wherein the periodic evaluation model comprises:
wherein,representing the periodicity index, ++>Representing the initial outlier data point +.>Time count of->Representing the initial outlier data point +.>Corresponding contrast data point->Time count of->Representing the initial outlier data point +.>Corresponding contrast data point->Is a function of the number of times of (a),expressed as natural constant->An exponential function of the base +.>
9. The method for analyzing network traffic big data according to claim 1, wherein the method for acquiring initial abnormal data points comprises:
and taking the data point with the abnormal value being greater than or equal to a preset abnormal threshold value as the initial abnormal data point.
10. The network traffic big data analysis method according to claim 9, wherein the preset anomaly threshold value is set to 0.7.
CN202311776561.6A 2023-12-22 2023-12-22 Network flow big data analysis method Active CN117439827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311776561.6A CN117439827B (en) 2023-12-22 2023-12-22 Network flow big data analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311776561.6A CN117439827B (en) 2023-12-22 2023-12-22 Network flow big data analysis method

Publications (2)

Publication Number Publication Date
CN117439827A true CN117439827A (en) 2024-01-23
CN117439827B CN117439827B (en) 2024-03-08

Family

ID=89555786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311776561.6A Active CN117439827B (en) 2023-12-22 2023-12-22 Network flow big data analysis method

Country Status (1)

Country Link
CN (1) CN117439827B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100030544A1 (en) * 2008-07-31 2010-02-04 Mazu Networks, Inc. Detecting Outliers in Network Traffic Time Series
CN109951491A (en) * 2019-03-28 2019-06-28 腾讯科技(深圳)有限公司 Network attack detecting method, device, equipment and storage medium
CN112380044A (en) * 2020-12-04 2021-02-19 腾讯科技(深圳)有限公司 Data anomaly detection method and device, computer equipment and storage medium
CN116723138A (en) * 2023-08-10 2023-09-08 杭银消费金融股份有限公司 Abnormal flow monitoring method and system based on flow probe dyeing
CN116738302A (en) * 2022-03-09 2023-09-12 诺佐米网络公司 Method for detecting anomalies in time-series data generated by an infrastructure device in a network
CN116846627A (en) * 2023-06-29 2023-10-03 中国大唐集团科学技术研究总院有限公司 Network security protection method and system based on flow analysis
US11792215B1 (en) * 2018-12-04 2023-10-17 Amazon Technologies, Inc. Anomaly detection system for a data monitoring service
US11792214B1 (en) * 2022-11-18 2023-10-17 Arctic Wolf Networks, Inc. Methods and apparatus for monitoring network events for intrusion detection
CN117241306A (en) * 2023-11-10 2023-12-15 深圳市银尔达电子有限公司 Real-time monitoring method for abnormal flow data of 4G network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100030544A1 (en) * 2008-07-31 2010-02-04 Mazu Networks, Inc. Detecting Outliers in Network Traffic Time Series
US11792215B1 (en) * 2018-12-04 2023-10-17 Amazon Technologies, Inc. Anomaly detection system for a data monitoring service
CN109951491A (en) * 2019-03-28 2019-06-28 腾讯科技(深圳)有限公司 Network attack detecting method, device, equipment and storage medium
CN112380044A (en) * 2020-12-04 2021-02-19 腾讯科技(深圳)有限公司 Data anomaly detection method and device, computer equipment and storage medium
CN116738302A (en) * 2022-03-09 2023-09-12 诺佐米网络公司 Method for detecting anomalies in time-series data generated by an infrastructure device in a network
US11792214B1 (en) * 2022-11-18 2023-10-17 Arctic Wolf Networks, Inc. Methods and apparatus for monitoring network events for intrusion detection
CN116846627A (en) * 2023-06-29 2023-10-03 中国大唐集团科学技术研究总院有限公司 Network security protection method and system based on flow analysis
CN116723138A (en) * 2023-08-10 2023-09-08 杭银消费金融股份有限公司 Abnormal flow monitoring method and system based on flow probe dyeing
CN117241306A (en) * 2023-11-10 2023-12-15 深圳市银尔达电子有限公司 Real-time monitoring method for abnormal flow data of 4G network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
吕军晖;周刚;金毅;: "一种基于时间序列的自适应网络异常检测算法", 北京航空航天大学学报, no. 05, 15 May 2009 (2009-05-15) *
解初;王建东;韩邦磊;王振;: "基于趋势特征聚类的多元相似时间序列的提取", 科学技术与工程, no. 07, 8 March 2020 (2020-03-08) *
陆春光;叶方彬;赵羚;姜驰;董伟;: "基于密度峰值聚类的电力大数据异常值检测算法", 科学技术与工程, no. 02, 18 January 2020 (2020-01-18) *

Also Published As

Publication number Publication date
CN117439827B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN112257063B (en) Cooperative game theory-based detection method for backdoor attacks in federal learning
CN113518011B (en) Abnormality detection method and apparatus, electronic device, and computer-readable storage medium
CN111625413A (en) Index abnormality analysis method, index abnormality analysis device and storage medium
CN107493277B (en) Large data platform online anomaly detection method based on maximum information coefficient
WO2018076571A1 (en) Method and system for detecting abnormal value in lte network
CN111314329B (en) Traffic intrusion detection system and method
CN110334105B (en) Stream data abnormity detection method based on Storm
CN113965390A (en) Malicious encrypted traffic detection method, system and related device
CN115931055A (en) Rural water supply operation diagnosis method and system based on big data analysis
CN116708038A (en) Industrial Internet enterprise network security threat identification method based on asset mapping
CN116933895B (en) Internet of things data mining method and system based on machine learning
CN109768995B (en) Network flow abnormity detection method based on cyclic prediction and learning
CN117439827B (en) Network flow big data analysis method
CN111738348B (en) Power data anomaly detection method and device
CN116128690B (en) Carbon emission cost value calculation method, device, equipment and medium
CN116701846A (en) Hydropower station dispatching operation data cleaning method based on unsupervised learning
CN116030955B (en) Medical equipment state monitoring method and related device based on Internet of things
CN116383645A (en) Intelligent system health degree monitoring and evaluating method based on anomaly detection
CN107783942B (en) Abnormal behavior detection method and device
CN113746862A (en) Abnormal flow detection method, device and equipment based on machine learning
CN115967972A (en) Network anomaly detection method and device, electronic equipment and storage medium
CN111343165A (en) Network intrusion detection method and system based on BIRCH and SMOTE
CN115378738B (en) Alarm filtering method, system and equipment based on classification algorithm
CN117273547B (en) Production equipment operation data processing method based on edge calculation
CN116448062B (en) Bridge settlement deformation detection method, device, computer and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant