CN116055182B

CN116055182B - Network node anomaly identification method based on access request path analysis

Info

Publication number: CN116055182B
Application number: CN202310042311.3A
Authority: CN
Inventors: 米存照
Original assignee: Beijing Telixin Electronics Technology Co ltd
Current assignee: Beijing Telixin Electronics Technology Co ltd
Priority date: 2023-01-28
Filing date: 2023-01-28
Publication date: 2023-06-06
Anticipated expiration: 2043-01-28
Also published as: CN116055182A

Abstract

The invention relates to the technical field of data processing, in particular to a network node anomaly identification method based on access request path analysis, which comprises the following steps: performing data processing on an observation state sequence of a request access path to be detected in a state observation space to obtain a local trend sequence, determining an INFLO value of each data point in a two-dimensional clustering space and the two-dimensional clustering space based on the observation state sequence and the local trend sequence, further determining a clustering influence factor corresponding to each data point, and determining an optimized clustering objective function by using the clustering influence factor; and obtaining each cluster class corresponding to the access path of the request to be detected based on the optimized clustering objective function, and further identifying the abnormal state of the network node. The invention improves the accuracy of network node abnormal state identification, solves the missing detection defect caused by the loss of the L-DDoS attack information in the prior method, and can be applied to the field of network node abnormal identification of the data center network.

Description

Network node anomaly identification method based on access request path analysis

Technical Field

The invention relates to the technical field of data processing, in particular to a network node anomaly identification method based on access request path analysis.

Background

Because of the characteristics of delay, diversity, synchronization and the like, the data flow of the data center network is easy to be subjected to hidden low-rate L-DDoS attack (Distributed Denial of Service, distributed denial of attack). The attack sources of the L-DDoS attack are distributed, and an attacker can reduce the experience of a server or a link by utilizing a plurality of attack sources. When the attack rate of the attack is small, the distribution of the source IP (Internet Protocol ) changes, and when the attack rate of the attack is large, the distribution of the source IP changes drastically, resulting in an increase in the information amount of the source IP. Therefore, in order to reduce the impact of the L-DDoS attack on the data flow, it is necessary to implement anomaly identification of the network node, i.e., identify the network node that issued the L-DDoS attack.

The L-DDoS attack has low attack rate, but the IP distribution information under the L-DDoS attack can change, the attack mode of the L-DDoS attack is gradually changed from a low-frequency low-rate attack state to a high-frequency high-rate attack state, and the attack frequency is changed from low to high, namely from slow to fast.

According to the existing method for identifying network nodes sending out L-DDoS attacks, states of L-DDoS attacks with different rates are regarded as hidden states through corresponding relations between hidden state sequences and feature observation sequences in HMM (Hidden Markov Model, hidden Markov models), preprocessed data features are used as feature observation sequences, and preprocessed data features, namely Raney entropy features of nodes initiating network access and requested nodes. The probability model is used for describing the possibility of the attack, raney entropy is used for extracting data characteristics, and the transition relation between states is utilized for carrying out self-adaptive detection on the network node which sends out the L-DDoS attack. However, when the HMM is utilized to detect the abnormal state of the network node, the method realizes feature dimension reduction clustering through one-dimensional data, which is easy to cause the loss of data information of the L-DDoS attack, so that the identification of the L-DDoS attack is inaccurate, the condition of missing detection occurs, and the accuracy of the abnormal identification of the network node is reduced.

Disclosure of Invention

In order to solve the technical problem of low accuracy of network node anomaly identification, the invention aims to provide a network node anomaly identification method based on access request path analysis.

The invention provides a network node anomaly identification method based on access request path analysis, which comprises the following steps:

acquiring a Raney entropy characteristic value time sequence of a request access path to be detected in a state observation space, and taking the Raney entropy characteristic value time sequence as an observation state sequence;

according to the difference between the Raney entropy characteristic values in the observation state sequence, determining each differential value in the differential sequence, and taking the differential sequence as a local trend sequence;

determining a two-dimensional clustering space according to the observation state sequence and the local trend sequence, and further determining an INFLO value of each data point in the two-dimensional clustering space;

determining a clustering influence factor corresponding to each data point according to the INFLO value of each data point;

determining an optimized clustering objective function according to the position and the clustering influence factor of each data point in the two-dimensional clustering space, and carrying out clustering processing on each data point in the two-dimensional clustering space by using the optimized clustering objective function to obtain each cluster class corresponding to the access path of the request to be detected;

and determining the abnormal state of the network node of the access path to be detected according to the observation state sequence, each cluster and the pre-constructed and trained hidden Markov model.

Further, determining a cluster influence factor corresponding to each data point according to the INFLO value of each data point comprises:

and calculating the absolute value of the difference value between the INFLO value of each data point and the INFLO value of the cluster center corresponding to the INFLO value, and taking the absolute value of the difference value after normalization processing as a clustering influence factor corresponding to the data point.

Further, the calculation formula of the optimized clustering objective function is as follows:

wherein ,

in order to optimize the clustering objective function, k is the number of the centers of all clusters in the two-dimensional clustering space, c is the serial number of the centers of each cluster in the two-dimensional clustering space, n is the number of all data points in the two-dimensional clustering space, i is the serial number of each data point in the two-dimensional clustering space,

a clustering influence factor for the ith data point in the two-dimensional clustering space, d is a Euclidean distance function,

bit being the ith data point in two-dimensional cluster spaceThe device is arranged in the way that the device is arranged,

for the position of the center of the c-th cluster class in the two-dimensional cluster space,

is the Euclidean distance between the ith data point and the center of the c cluster class in the two-dimensional cluster space.

Further, determining a two-dimensional clustering space according to the observation state sequence and the local trend sequence, including:

discarding a first Raney entropy characteristic value in the observation state sequence, carrying out sequence number marking processing on each Raney entropy characteristic value in the discarded observation state sequence and each difference value in the local trend sequence, and forming two-dimensional data by the Raney entropy characteristic value and the difference value at the same sequence number position in the two sequences to obtain a two-dimensional clustering space.

Further, determining the abnormal state of the network node of the access path to be detected according to the observation state sequence, each cluster and the pre-constructed and trained hidden Markov model, including:

numbering each cluster, marking the serial number of the cluster to which each Raney entropy characteristic value belongs in an observation state sequence, taking the cluster serial number of each Raney entropy characteristic value in the observation state sequence as model input data, inputting the model input data into a pre-constructed and trained hidden Markov model, and outputting the network node abnormal state of the access path to be detected, wherein the network node abnormal state comprises normal access and L-DDoS attack.

Further, determining each differential value in the differential sequence from the differences between the Raney entropy feature values in the observed state sequence comprises:

and calculating a difference value between a previous Raney entropy characteristic value and a next Raney entropy characteristic value in the observation state sequence, taking the difference value as a difference value, and taking a sequence formed by each difference value as a difference sequence.

The invention has the following beneficial effects:

the invention provides a network node anomaly identification method based on access request path analysis, which comprises the steps of carrying out data processing on an observation state sequence of a request access path to be detected in a state observation space to obtain a local trend sequence, wherein the local trend sequence can represent trend information of data points, the trend information is helpful for correcting the state space dimension reduction of the observation state sequence, so that the observation state sequence is divided to present more trend change information, further, a hidden Markov model can sense more state transition information, the detection performance of the hidden Markov model on L-DDoS attack is improved, and the anomaly identification accuracy of the network node is enhanced; in order to improve the dividing precision of the observation state sequence, a two-dimensional clustering space is determined based on the observation state sequence and the local trend sequence, and the two-dimensional clustering space enables state transition in the observation state sequence to be capable of describing the transition of the attack state, so that the abnormality recognition accuracy of the L-DDoS attack can be improved; in order to determine distribution information of data points in a two-dimensional clustering space, an INFLO value of each data point is obtained, the INFLO value is helpful for improving the reference value of a clustering influence factor, the clustering influence factor corresponding to each data point can be determined based on the INFLO value, the clustering influence factor can enable the subsequent clustering process to consider the occurrence frequency of the data point in an observation state sequence, so that the subsequent optimization of a clustering objective function of the observation state sequence is facilitated, and the partitioning precision of the observation state sequence is improved again; based on the position of each data point in the two-dimensional clustering space and the clustering influence factor, an optimized clustering objective function is determined, clustering dimension reduction processing is realized, the abnormal state of the network node of the access path to be detected is obtained, three influence factors are considered when the optimized clustering objective function is calculated, namely trend information, source IP address Raney entropy and observed state occurrence frequency, the state transfer information after the clustering dimension reduction processing can be more consistent with the data characteristics of the L-DDoS attack, the purpose of improving the accuracy of network node abnormal recognition is achieved, and the method is mainly applied to the field of network node abnormal recognition.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a network node anomaly identification method based on access request path analysis according to the present invention;

FIG. 2 is a schematic diagram of data points with different trend changes in an embodiment of the present invention;

fig. 3 is a schematic diagram of a distribution of data points in a two-dimensional clustering space according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The specific scenario targeted by this embodiment may be:

in the process of realizing network node anomaly detection based on the hidden Markov model HMM, an observation state sequence formed by Raney entropy of the source IP and the target IP can be used as training data of the hidden Markov model. Since the number of the rani entropy feature values is too large, for example, the rani entropy feature values may be 1.01,1.015,1.012, if each rani entropy feature value is taken as an observation state, the calculation amount in training the hidden markov model is large, and the robustness of the model will be affected. In order to improve the efficiency of network node anomaly detection and the robustness of the hidden Markov model, the state observation space is subjected to dimension reduction processing based on an observation state sequence, the cluster serial number corresponding to each numerical value in the observation state sequence is used as state information in the prior K-means clustering mode, and the state information at the moment is used as input data of the hidden Markov model, so that dimension reduction processing is realized.

The step of performing dimension reduction processing on the state observation space based on the observation state sequence comprises the following steps: firstly, converting an observation state sequence into one-dimensional data to obtain the one-dimensional data; then, given a clustering K value, clustering one-dimensional data, dividing an observation state sequence into K cluster types, numbering the K cluster types, and marking each number value in the observation state sequence with the cluster type number to which the value belongs, so that a state space with a very large range in the observation state sequence is converted into K states, and a K-dimensional observation state sequence is obtained; and finally, carrying out hidden Markov model training by taking the K-dimensional observation state sequence as training data. The magnitude of the value of the cluster K can be set by an implementer according to the specific condition of the one-dimensional data, and the value is not particularly limited.

When the observation state sequence is converted into one-dimensional data for clustering, only the Raney entropy characteristic value in the observation state sequence is used, but the trend information containing the observation state is ignored, so that the state transition information in the observation state sequence after the dimension reduction is lost, the training result of the hidden Markov model is deviated, and the network node abnormality identification accuracy is lower. In order to overcome the defect of low accuracy of network node anomaly identification, the embodiment provides a network node anomaly identification method based on access request path analysis, as shown in fig. 1, which comprises the following steps:

s1, acquiring a Raney entropy characteristic value time sequence of a request access path to be detected in a state observation space, and taking the Raney entropy characteristic value time sequence as an observation state sequence.

In this embodiment, in order to facilitate detection of an abnormal state of a network node in an access request path, information reconstruction is performed on network traffic packet information corresponding to an SDN (software defined network ) data center network controller, a TCP (Transmission Control Protocol ) session process is obtained, basic information of the TCP session process is analyzed, and a source IP (Internet Protocol ) address and a target IP address of traffic are obtained. The method comprises the steps of setting fixed time intervals, wherein the time intervals can be 5s, taking the time intervals as time flows, calculating the Raney entropy characteristic values corresponding to a source IP address and a target IP address of each time interval in order to facilitate understanding of access distribution information of each time interval, wherein each time interval has the corresponding Raney entropy characteristic value, and each Raney entropy characteristic value can form a Raney entropy characteristic value time sequence. The Raney entropy characteristic value can represent the distribution state information of the access path, so the Raney entropy characteristic value time sequence is used as an observation state sequence of the access path of the request to be detected in a state observation space, and the Raney entropy characteristic value at the moment is real-time monitoring data. The process of obtaining the rani entropy feature value is the prior art and is not within the scope of the present invention, and will not be described in detail here.

The method comprises the steps of analyzing network traffic data packets transmitted by an SDN data center network, setting fixed time intervals, obtaining Raney entropy characteristic values of each time interval, and forming an observation state sequence, wherein the observation state sequence is a Raney entropy characteristic value time sequence taking the fixed time intervals as time streams.

S2, determining each differential value in the differential sequence according to the difference between the Raney entropy characteristic values in the observation state sequence, and taking the differential sequence as a local trend sequence.

In this embodiment, in order to improve the accuracy of network node anomaly identification, the trend information of each data point is considered when the state observation space is subjected to the dimension reduction processing in the following, so that the differential sequence corresponding to the observation state sequence is determined based on each Raney entropy characteristic value in the observation state sequence, and the local trend analysis on the observation state sequence can be realized based on the differential sequence. Specifically, a difference value between a previous Raney entropy characteristic value and a next Raney entropy characteristic value in the observation state sequence is calculated, the difference value is used as a difference value, and a sequence formed by the difference values is used as a difference sequence. The differential values may characterize the local trends of the observed state sequence, so the differential sequence is taken as a local trend sequence.

It should be noted that the hidden markov model may be used to determine hidden state information of real-time monitoring data of the access request path, where the hidden state information is an abnormal state of the network node. In order to enable the real-time monitoring data to have the same trend information extraction mode as the historical monitoring data, each numerical value in the observation state sequence can only refer to the previous numerical value when the local trend of the monitoring data is analyzed, namely, a differential sequence is determined, and each differential value in the differential sequence is used as the local trend information to be compatible with the local trend calculation of the real-time monitoring data. In addition, it should be noted that, in the observation state sequence, each rani entropy feature value has its trend information in the sequence, and some of the rani entropy feature values have the same trend information. In the subsequent clustering dimension reduction process, if numerical information and trend information are considered at the same time, each cluster can be made to correspond to a plurality of data points with similar trend as much as possible, and data points with larger trend difference are arranged among different clusters, and state transition in an observation state sequence can be made to be more capable of explaining the transition of attack states by considering the trend information, so that the detection effect of L-DDoS attack is improved.

S3, determining a two-dimensional clustering space according to the observation state sequence and the local trend sequence, and further determining an INFLO value of each data point in the two-dimensional clustering space, wherein the steps comprise:

s31, determining a two-dimensional clustering space according to the observation state sequence and the local trend sequence.

In this embodiment, due to the data characteristics of the local trend sequence, the data characteristics herein represent that the differential value in the local trend sequence is obtained by the difference between the previous value and the next value, the first rani entropy characteristic value in the observation state sequence needs to be discarded, serial number marking is performed on each of the discarded rani entropy characteristic values in the observation state sequence and each of the differential values in the local trend sequence, the rani entropy characteristic values and the differential values at the same serial number positions in the two sequences form two-dimensional data corresponding to one another, a two-dimensional clustering space is obtained, the two-dimensional data may be data points in the two-dimensional clustering space, the abscissa of each data point is the differential value, and the ordinate is the rani entropy characteristic value.

It should be noted that, in the conventional one-dimensional numerical clustering process of the rani entropy, only the numerical information is considered, so that the data points with different trend changes are divided into the same cluster, and a schematic diagram of the data points with different trend changes is shown in fig. 2. Two data points with different trends are judged to be in the same observation state in the dimension reduction process because the Raney entropy characteristic values are the same, the hidden Markov model can not extract the frequency and periodicity of the L-DDoS attack, the judgment can only be carried out through the numerical change, the obvious characteristics of the L-DDoS attack are that the attack speed and the attack frequency are higher and higher, and the recognition effect of the L-DDoS attack is reduced; meanwhile, only the distance relation among the numerical values in the data space is considered, the situation that data points close to each other among the numerical values are divided into one state can occur, the state difference information is easy to lose in the situation, for example, the Raney entropy characteristic value at a certain time point in an observation state sequence is in an ascending state, but because K-means clustering only considers the distance among the numerical values, the data points near the local peak value of the Raney entropy characteristic value are divided into one state, and the state observation sequence loses the information of partial state transition; the attack frequency and the attack speed of the L-DDoS attack are key characteristics of anomaly detection, the numerical information only comprises the attack speed, but ignores the attack frequency, namely the periodic trend of time sequence data in a state observation sequence, and the attack frequency plays a key role in accurately detecting the L-DDoS attack.

In order to improve the abnormality identification accuracy of the L-DDoS attack, a differential value (local trend information) is added for subsequent dimension reduction processing, the local trend information is obtained by observing the whole periodic information and the whole trend information in a state sequence, namely, the two-dimensional characteristics of a differential value-Raney entropy are clustered, and data points with similar change directions and change speeds are divided into the same states, so that a hidden Markov model for identifying the L-DDoS attack has a judging and analyzing process of the trend information and the periodic information, and the hidden Markov model can extract the period information of state change through the period of trend change.

S32, determining the INFLO value of each data point in the two-dimensional clustering space according to each data point in the two-dimensional clustering space.

In order to improve the dividing precision of the subsequent clustering, the data points in the same distribution state are divided into one cluster, and the local distribution information of the data points in the two-dimensional clustering space needs to be considered, so that the hidden Markov model can sense the frequency information of the L-DDoS attack, reduce the influence of noise points in the two-dimensional clustering space on the subsequent clustering, and the noise points can be divided into clusters with a larger range in the traditional K-means clustering process, wherein the clusters are abnormal clusters, and the abnormal clusters can influence the state division of real-time observation data points, so that the possibility of misjudgment of the subsequent hidden Markov model is increased.

The distribution of data points in the two-dimensional clustering space is schematically shown in fig. 3, firstly, all data points in the cluster C1 in fig. 3 are uniformly and densely distributed, which can illustrate that the rani entropy characteristic values and the local trends of all data points in the cluster C1 are relatively concentrated, and the data points are high-frequency and high-density in the observed state sequence, that is, the data points in the numerical range of the rani entropy characteristic values and the difference values corresponding to the cluster C1 are more and densely distributed, wherein the densely distributed numerical values refer to that the states represented by the data points in the observed state sequence are multiple. Then, in the hidden markov model detection process, transition from the multiple states or reaching the multiple states can account for the hidden state change of the request access path to be detected. Then, data points in the clusters with sparse distribution are characterized by data points which do not appear frequently in an observation state sequence, the value ranges of Raney entropy characteristic values and differential values corresponding to the data points are larger, the data state characteristics of the data points are consistent with low-rate and low-frequency attack information in the earlier stage of L-DDoS attack, and the state transition information of the data points in the clusters with sparse distribution can more represent the progressive progress of the attack information. Finally, based on the distribution characteristics of the data points, the data points P in fig. 3 should be classified into the cluster C2, but the data points P are classified into the cluster C1 under the influence of the numerical information of the data points P. Therefore, in order to unify the data point characteristics of each cluster which is determined later, the network node identification result is more accurate, the optimal L-DDoS attack effect is achieved, the cluster which the outlier data point belongs to in the two-dimensional clustering space is directly judged, namely, the space distance calculation is not carried out on the outlier data point, so that the situation that the state misjudgment of the real-time monitoring data is caused by the fact that the cluster space is too large is prevented.

In the two-dimensional clustering space, the traditional K-means clustering can divide data points with similar values and similar local tendencies into the same cluster, but only the traditional K-means clustering with space distance is considered in the two-dimensional space, and for data points with larger local density difference and smaller space distance, data points with high occurrence frequency and data points with low occurrence frequency in an observation state sequence can be easily divided into the same cluster, so that the influence of abnormal data points in the cluster division is enhanced. In an L-DDoS attack, data points with high frequency of occurrence are more likely to be normal data points, while data points with low frequency of occurrence are more likely to be abnormal network nodes. If the data points with larger frequency difference are divided into the same cluster class and are expressed as the same state, the hidden Markov model can have the condition of error detection in the process of detecting the L-DDoS attack. In order to avoid the above-mentioned error detection situation, the present embodiment calculates the infoc value (outlier) of the data point to analyze the local density situation of the data point by the infoc algorithm, where the infoc algorithm is suitable for when the clusters of different densities are used, the infoc value may reflect the local densities of all the data points in the two-dimensional cluster space of the object, that is, the infoc value measures the local densities of the data points by the nearest neighbor and the reverse nearest neighbor, which helps to distinguish the area to which the data point belongs in the data space distribution, and to distinguish whether a data point should be classified into the current cluster. The implementation process of the inlo algorithm is prior art and is not within the scope of the present invention and will not be described in detail herein.

S4, determining a clustering influence factor corresponding to each data point according to the INFLO value of each data point.

In this embodiment, the Euclidean distance in the traditional K-means cluster is weighted by the local density difference between the data point and the cluster center in the clustering iterative process, so that the data points with similar local densities and two-dimensional distances of the differential value-Raney entropy are divided into the same cluster. Specifically, calculating the absolute value of the difference between the INFLO value of each data point and the INFLO value of the cluster center corresponding to the INFLO value, taking the absolute value of the difference after normalization processing as a clustering influence factor corresponding to the data point, namely taking the numerical difference between the data point and the INFLO value of the cluster center point as the weight of the Euclidean distance in the K-means clustering process, wherein the clustering influence factor can ensure that the state division of the data point is more accurate, accords with the detail state change, and the calculation formula of the clustering influence factor can be as follows:

wherein ,

as a cluster influence factor for the ith data point in the two-dimensional cluster space,

as a function of the normalization,

an inlo value for the i-th data point in the two-dimensional cluster space,

is the INFLO value of the center of the c-th cluster class in the two-dimensional cluster space.

In the calculation formula of the clustering influence factor, the range of the normalization function can be all data points contained in the cluster to which the data points belong in each clustering iteration process;

the local density difference between two data points can be represented, and if the INFLO values of all the data points in one cluster are similar, the similar local density exists in all the data points in the cluster;calculating a cluster influence factor

The method is beneficial to considering the local density of the data space of the state corresponding to the data points in the observation state sequence in the clustering dimension reduction process, preventing the data points with sparse local density from being divided into clusters with high density due to numerical reasons, and simultaneously being beneficial to dividing abnormal outlier data points into the same cluster so as to prevent the cluster range from being overlarge.

In order to eliminate the influence of local outlier information, in this embodiment, an INFLO value of a data point is obtained through an existing INFLO algorithm, an influence space is determined by the INFLO value through nearest neighbor and inverse nearest neighbor, and a local density state of the data point is determined in the influence space, so that the INFLO value of the data point in the two-dimensional clustering space can effectively determine local distribution characteristics of the data point, so as to correct Euclidean distance of two data points in a dimension-reducing clustering process, and facilitate accurate determination of the cluster class of each data point. Data points with similar trend information, similar Raney entropy characteristic values and similar occurrence frequencies in a state observation space can be divided into the same cluster by the clustering influence factors, so that the state information of the data points corresponding to each cluster is more similar. The calculation process of the flo value is the prior art and is not within the scope of the present invention, and will not be described in detail here.

S5, determining an optimized clustering objective function according to the position and the clustering influence factor of each data point in the two-dimensional clustering space, and clustering each data point in the two-dimensional clustering space by using the optimized clustering objective function to obtain each cluster corresponding to the access path of the request to be detected, wherein the method comprises the following steps:

s51, determining an optimized clustering objective function according to the position of each data point in the two-dimensional clustering space and the clustering influence factor.

After obtaining the clustering influence factor of each data point in the two-dimensional clustering space, the influence factor is required to be added in the subsequent K-means clustering iteration process, namely, the influence factor is added for optimization on the basis of the traditional K-means clustering objective function, so as to obtain an optimized clustering objective function, and the calculation formula is as follows:

wherein ,

the clustering influence factor of the ith data point in the two-dimensional clustering space, namely the influence factor of the distance measurement between the ith data point in the two-dimensional clustering space and the c cluster center point in the clustering process, d is a Euclidean distance function,

for the position of the ith data point in the two-dimensional cluster space,

In the calculation formula for optimizing the clustering objective function, the clustering influence factor of the ith data point is used

To correct the measurement of the distance between the ith data point and the center point of the c-th cluster in the clustering process. Because the minimum function value is required to be determined in the clustering iteration to optimize the clustering objective function, each cluster to which the data point belongs is the cluster closest to the data point, and if the clustering influence factor of a certain data point is larger, the data point and the cluster center are indicated to be positionedThe local densities of the positions are different, and the difference in the local distribution of the data is large, and the clustering influence factor at the moment can be used as the weight of the Euclidean distance corresponding to the data point. Determining the weight of the Euclidean distance is beneficial to improving the accuracy of subsequent clustering processing, and further improving the accuracy of the network node identification result of the hidden Markov model. When the weight of a certain data point is greater than 1, the Euclidean distance between the data point and the cluster center corresponding to the data point is enlarged, and the data point is possibly not divided into clusters belonging to the cluster center, so that the purpose that local distribution information influences the clustering result is achieved.

S52, clustering each data point in the two-dimensional clustering space by using the optimized clustering objective function to obtain each cluster corresponding to the access path of the request to be detected.

Firstly, it should be noted that different rani entropy feature values can form different observation states, based on the transition between the observation states, whether the access path of the request to be detected belongs to normal access or abnormal L-DDoS attack can be estimated, and the number of the rani entropy feature values in the observation state sequence corresponding to the access path of the request to be detected is larger, which can form an oversized state observation space, and the oversized state observation space is easy to reduce the efficiency of network node abnormality identification, increase the calculation amount, and cannot obtain an accurate network node abnormality identification result. Therefore, feature dimension reduction needs to be performed on the observation state space, and similar observation states are identified as one observation state, namely, n-dimensional state space is converted into K-dimensional state space by using K-means clustering, wherein n is the number of Raney entropy feature values.

In the embodiment, through a K-means clustering method, clustering processing is carried out on each data point in a two-dimensional clustering space by utilizing an optimized clustering objective function, and each cluster corresponding to a request access path to be detected is obtained. The implementation process of K-means clustering is the prior art and is not within the scope of the present invention, and will not be described in detail herein. And clustering processing is carried out by utilizing an optimized clustering objective function, so that each data point with similar local density in a two-dimensional clustering space can be divided into the same cluster, and the same state is judged, wherein the Raney entropy characteristic value and the difference value of each data point are relatively similar.

So far, each cluster class containing local trend information and numerical information is obtained through correction of the clustering process by the clustering influence factors of the data points.

S6, determining the abnormal state of the network node of the access path to be detected according to the observation state sequence, each cluster and the pre-constructed and trained hidden Markov model.

In this embodiment, in order to implement abnormal identification of a network node, numbering is performed on each cluster, each rani entropy feature value in an observation state sequence is marked with the number of the cluster to which it belongs, the cluster number of each rani entropy feature value in the observation state sequence is used as model input data, the model input data is input into a pre-constructed and trained hidden markov model, and the abnormal state of the network node of the access path to be detected is output, where the abnormal state of the network node includes normal access and L-DDoS attack.

It should be noted that, the training data of the hidden markov model may be a plurality of observation state sequences after the clustering dimension reduction processing, that is, a sequence formed by cluster numbers of each rani entropy feature value, where the training data at this time can reflect the change of different observation states, so that the hidden markov model obtains more abnormal state information of the network node in the state transition process, and the recognition accuracy of the hidden markov model on the L-DDoS attack is improved. In order to realize accurate detection of the hidden Markov model on the network node abnormality, the time interval of the observation state sequence in the training process and the time interval in the real-time monitoring process need to be kept consistent, and the time interval can be 5s. Hidden markov models are statistical models that can be used to describe a markov process that contains hidden unknown parameters, determine the hidden parameters of the process from the observable parameters, and then use these hidden parameters to perform further analysis, such as pattern recognition. The training process of the hidden markov model is prior art and is not within the scope of the present invention and will not be described in detail herein.

Thus, the network node anomaly identification based on access request path analysis is realized.

The embodiment provides a network node anomaly identification method based on access request path analysis, which is characterized in that an observation state sequence is obtained by carrying out data processing on network flow data packets of an SDN data center network, and input data of a hidden Markov model is obtained by carrying out clustering dimension reduction processing on the observation state sequence, so that the network node anomaly identification is realized. In the clustering dimension reduction process, numerical information of the Raney entropy characteristic values, trend information of the differential values and local distribution information of the data points are considered, and the measurement of the local trend information is added, so that the data points of the similar Raney entropy characteristic values and the similar trend information in the clustering dimension reduction process can be regarded as the same state. The method has the advantages that the influence factors of the three aspects of the observation state are considered, the state division precision of the observation state sequence after clustering dimension reduction can be effectively improved, the detection precision of a pre-constructed and trained hidden Markov model is further improved, and the network node abnormal state of the access request path analysis is detected with high accuracy.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. The network node anomaly identification method based on the access request path analysis is characterized by comprising the following steps of:

determining the abnormal state of the network node of the access path to be detected according to the observation state sequence, each cluster and a pre-constructed and trained hidden Markov model;

determining a cluster influence factor corresponding to each data point according to the INFLO value of each data point, wherein the cluster influence factor comprises:

calculating the absolute value of the difference value between the INFLO value of each data point and the INFLO value of the cluster center corresponding to the INFLO value, and taking the absolute value of the difference value after normalization processing as a clustering influence factor corresponding to the data point;

the calculation formula of the optimized clustering objective function is as follows:

wherein ,

in order to optimize the clustering objective function, k is the number of all cluster centers in the two-dimensional clustering space, c is the serial number of each cluster center in the two-dimensional clustering space, n is the number of all data points in the two-dimensional clustering space, i is the serial number of each data point in the two-dimensional clustering space>

Clustering influence for ith data point in two-dimensional clustering spaceFactor d is Euclidean distance function, +.>

For the position of the ith data point in the two-dimensional cluster space, and (2)>

the Euclidean distance between the ith data point and the c cluster center in the two-dimensional cluster space;

determining a two-dimensional clustering space according to the observation state sequence and the local trend sequence, wherein the two-dimensional clustering space comprises the following steps:

2. The method for identifying network node anomalies based on access request path analysis according to claim 1, wherein determining network node anomalies for a request access path to be detected based on the observation state sequences, the clusters, and the pre-constructed and trained hidden markov model comprises:

3. The method of claim 1, wherein determining each difference value in the difference sequence based on differences between the ranientropy feature values in the observed state sequence comprises: