CN113852629B

CN113852629B - Network connection abnormity identification method based on natural neighbor self-adaptive weighted kernel density and computer storage medium

Info

Publication number: CN113852629B
Application number: CN202111121169.9A
Authority: CN
Inventors: 隆华; 熊忠阳; 张玉芳
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2022-10-28
Anticipated expiration: 2041-09-24
Also published as: CN113852629A

Abstract

The invention provides a network connection abnormity identification method based on natural neighbor self-adaptive weighted kernel density and a computer storage medium. The method comprises the following steps: preprocessing data; self-adaptive iteration is carried out to obtain a natural neighbor set of each preprocessed data; solving the self-adaptive bandwidth coefficient and weight of each data according to the natural neighbor set of each data; calculating the self-adaptive weighted kernel density, the outlier and the outlier threshold of each data according to the self-adaptive bandwidth coefficient and the weight, or calculating the upper bound of the outlier of each data; and marking the data with the maximum n outliers or all the data larger than the threshold value of the outliers in the network connection record parameters as abnormal data to finish the network connection abnormality identification, wherein n is a positive integer. The network connection abnormity identification method can provide inspiration for abnormal data detection of large-scale data, and abnormal data can be extracted without reference under the condition that the quantity of the abnormal data is uncertain.

Description

Network connection abnormity identification method based on natural neighbor self-adaptive weighted kernel density and computer storage medium

Technical Field

The invention relates to the field of data mining, in particular to a network connection abnormity identification method based on natural neighbor self-adaptive weighted kernel density and a computer storage medium.

Background

With the rapid development of related technologies in the field of data mining, people pay more attention to the behavior patterns of most data objects, namely anomaly detection, while paying attention to the overall trend of the data objects. Anomaly detection is one of the most important tasks in the field of data mining, and has wide application in many fields, such as fraud detection by analyzing log data to detect misuse or suspicious fraudulent behavior, and in medical fields to identify abnormal cells or tumors, and in addition to the above applications, anomaly detection is applied in many scenarios, such as data leakage prevention, finding abnormal energy consumption, detecting counterfeit documents, and the like.

The popularization of internet technology in various industries brings great convenience to the life of people, and with the network security problem of the internet, various abnormal network connections become more and more common, and the abnormal network connections can cause serious information security problems such as abnormal webpage skipping, slow webpage opening speed and even personal privacy leakage, so that the identification of the abnormal network connections is very important.

The existing anomaly detection algorithms can be mainly classified into the following categories:

based on the distribution model: distribution-based methods typically assume that a data set follows a certain distribution, and then a model based on that distribution is built to detect anomalous objects. This type of approach performs well with sufficient data and known data distribution. Most applications produce datasets that often do not exhibit an ideal mathematical distribution, and it is difficult to estimate the distribution of high dimensional data. Therefore, the distribution-based approach is only applicable to cases where the data distribution is known or the data dimensionality is low.

Based on clustering: a cluster-based anomaly detection algorithm divides data into clusters according to similarities between the data and then defines an anomalous object as a data object that is not in any cluster or is far from the center of the nearest cluster. However, the performance of such methods depends mainly on the clustering algorithm used, and the outlier data is often just a byproduct of the clustering. Such methods may be ineffective if the anomalous data is assigned to a large cluster by the clustering algorithm.

Based on the neighbors: the neighbor-based method allows test data to determine its properties through a set of found neighbors, which may be "global" or "local". Neighbor-based techniques can be divided into two categories, distance-based and density-based, where distance-based methods use the distance between data as a measure of anomaly detection, without requiring the data itself to satisfy a particular distribution; density-based methods typically work to find the density of the data, and then combine the neighboring set to find the degree of outlier of the data, which is often of a "local" nature. Both distance-based and density-based methods face the problem of selecting the nearest neighbor number k, the selection of k will affect the performance of the algorithm, and meanwhile, the definition of density in the density-based method directly affects the accuracy of the algorithm.

Disclosure of Invention

In order to overcome the defects in the prior art, the present invention provides a method for identifying network connection anomalies based on a natural neighbor adaptive weighted kernel density and a computer storage medium.

In order to achieve the above object, the present invention provides a network connection anomaly identification method based on natural neighbor adaptive weighted kernel density, which includes the following steps:

carrying out data preprocessing on the network connection recording parameters;

self-adaptive iteration is carried out to obtain a natural neighbor set of each preprocessed data;

solving the self-adaptive bandwidth coefficient and weight of each data according to the natural neighbor set of each data;

calculating the self-adaptive weighted kernel density, the outlier and the outlier threshold of each data according to the self-adaptive bandwidth coefficient and the weight, or calculating the upper bound of the outlier of each data;

and marking the data with the maximum n outliers or all the data larger than the threshold value of the outliers in the network connection record parameters as abnormal data to finish the network connection abnormality identification, wherein n is a positive integer.

The network connection abnormity identification method adopts a self-adaptive bandwidth coefficient and a self-adaptive weight to enable the density estimation of data to be more accurate and robust; the method for rapidly cutting data by adopting the mode of outlier upper bound can be used for providing inspiration for abnormal data detection of large-scale data; by adopting the self-adaptive weighted kernel density, the outlier and the outlier threshold, abnormal data can be extracted without parameters under the condition that the number of the abnormal data is uncertain.

The preferred scheme of the network connection abnormity identification method comprises the following steps: the generation steps of the natural adjacent set of each data are as follows:

(1) Constructing a KD tree for the preprocessed data set;

(2) Traversing the data set in the KD tree, searching k neighbors of each data and putting the k neighbors into a corresponding neighbor set NN, and updating an inverse neighbor set RNN of the data regarded as the k neighbors, wherein k is a positive integer with an initial value of 1;

(3) If the reverse neighbor set of the data set is empty or the quantity of data of which the reverse neighbor set is empty in two adjacent iterations changes, adding 1 to the k value and executing the step (2);

if each data in the data set has at least one reverse neighbor or the number of data with the reverse neighbor set being empty in two adjacent iterations is not changed, the state of the data set can be considered to be stable at the moment, the k value is not increased, and then the step (4) is executed;

(4) And (4) solving the intersection of each data neighbor set NN and the reverse neighbor set RNN, so as to obtain the natural neighbor set NaN of each data.

The natural neighbor set of each data is solved by adopting an iterative mode, and compared with k neighbor, a neighbor parameter k is not required to be given, so that the defect that the performance difference of the algorithm is large due to different k values is avoided, and the algorithm has stability.

The preferable scheme of the network connection abnormity identification method comprises the following steps: the adaptive bandwidth coefficient calculation formula of the data object p is h _p Where h is a fixed bandwidth factor, dist is a distance function, and data object q is the nearest neighbor in the natural neighbor set of data object p that is farthest from data object p.

The method for calculating the self-adaptive weight of the data object p comprises the following steps: computing a data object pCost (p, x) for data x to reach each other, cost (p, x) = min (r) { r | x ∈ NaN _r (p)∧p∈NaN _r (x) The data x is any data in a natural adjacent set NaN (p) of the data object p, refers to data which is in the natural adjacent set of the data object p and is close to the r-th position of the data object p, and refers to data which is in the natural adjacent set of the data object x and is close to the r-th position of the data object x;

and calculating the average cost of the data object p and all the data in the natural neighbor set NaN (p) which can reach each other, thereby obtaining the self-adaptive weight (p) of the data object p.

The adaptive bandwidth coefficient and the adaptive weight are adopted, so that the density estimation of the data is more accurate and robust.

The preferable scheme of the network connection abnormity identification method comprises the following steps: the adaptive weighted kernel density AKDE (p) for data object p is calculated as:

wherein weight (p) is the self-adaptive weight of the data object p, KDE (p) is the kernel density estimation of the data object, and the calculation formula is as follows:

where | NaN (p) | is the number of data in the natural neighbor set of data object p, d is the dimensionality of data object p, h _p Is the adaptive bandwidth coefficient of data object p, dist is a distance function, and data object q is the nearest neighbor in the natural neighbor set of data object p that is farthest from data object p.

The formula for the degree of outlier KOF (p) of data object p is:

where | NaN (p) | is the number of data in the natural neighbor set of data object p, AKDE (p) is the adaptive weighted kernel density of the data object, and AKDE (q) is the adaptive weighted kernel density of the data object.

The outlier threshold calculation steps are as follows:

firstly, the calculated outliers are sorted according to non-decreasing order, and the change rate KO of the outliers is calculatedF _var(i,j) ：

Where i, j is the subscript of two adjacent data objects;

calculating an outlier threshold KOF based on the calculated outlier rate _threshold The formula is as follows: KOF _threshold ＝mean(KOF _var )+ω*std(KOF _var ) Wherein mean (KOF) _var ) Mean value of the degree of change of the degree of outliers, std (KOF) _var ) The standard deviation of the rate of change of the degree of outliers, and ω is the adjustment factor.

The calculation step of the upper limit of the outlier of the data object p comprises the following steps:

computing an adaptive weighted kernel density upper bound AKDE for a data object p _max (p)：

Wherein the data object o is the data closest to the data object p in the natural neighbor set of the data object p;

computing an adaptive weighted kernel density lower bound AKDE for a data object p _min (p)：

Wherein data object q is the data farthest from p in the natural neighbor set of data object p;

calculate the upper outlier UBKOF (p) of data object p:

wherein NaN (p) is the natural adjacent set of the data object p, | NaN (p) | is the data number in the natural adjacent set of the data object p, AKDE _min (p) AKDE, the lower bound of the adaptively weighted kernel density for data object p _max (x) KOF (p) is the outlier of data object p for the upper bound of the adaptive weighted kernel density of data x in the natural neighbor set of data object p.

The preferable scheme of the network connection abnormity identification method comprises the following steps: the step of selecting the n data with the maximum degree of outlier in the network connection recording parameters is as follows:

(1) Randomly selecting n data, and constructing a minimum heap based on the outliers of the n data to make the heap top outlier KOF (top);

(2) Traversing the remaining data in the dataset:

for a data object p, if the upper bound of the degree of outlier UBKOF (p) of the data object p is less than the top of heap degree of outlier KOF (top), continuing to perform step (2); otherwise, executing the step (3); after the data traversal is finished, executing the step (5);

(3) Calculating an outlier KOF (p) of the data object p, and if KOF (p) is less than KOF (top), performing step (2); otherwise, executing step (4).

(4) Popping the heap top element, putting the value of KOF (p) into the heap, and updating the minimum value of the degree of outlier in the heap to be used as the KOF (top);

(5) And outputting data corresponding to the n outliers in the heap.

The calculation of the top-n problem is accelerated, and the data with the maximum n outliers in the network connection recording parameters can be quickly selected.

The application also provides a computer storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction enables a processor to execute the operation corresponding to the network connection abnormity identification method based on the natural neighbor self-adaptive weighted kernel density.

The invention has the beneficial effects that: according to the method, the self-adaptive weight is used when the density estimation is carried out on the data, so that the density estimation of the data is more accurate, the density estimation which is more robust than an LOF algorithm can be obtained by adjusting the self-adaptive bandwidth coefficient in the kernel density estimation, and the degree of outlier (relative density) obtained by abnormal data in a sparser area is larger than that of the LOF algorithm; meanwhile, the calculation of the top-n problem is accelerated, and abnormal data can be solved under the condition that the quantity of the abnormal data is uncertain by using a statistical method.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a data set diagram of network connection record parameters in an embodiment;

FIG. 3 is a graph of data set outliers and outlier thresholds in an embodiment;

FIG. 4 is a diagram of anomaly data extracted for the top-n problem.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

As shown in fig. 1, the present invention provides an embodiment of a network connection anomaly identification method based on a natural neighbor adaptive weighted kernel density, which is described in detail below.

The network connection record parameters are obtained first, as shown in fig. 2. The network connection recording parameters mainly include four categories: the connection basic characteristics, connection content characteristics, time-based network traffic statistical characteristics, host-based network traffic statistical characteristics, total 41 items, and sample data are shown in table 1:

TABLE 1

Then, preprocessing the acquired data set of the network connection record parameters, in the embodiment, the preprocessing operation of the data set comprises removing repeated network connection records, deleting network connection records with illegal formats, and selecting four attributes of { service, duration, srcbytes and dst _ bytes } as basic attributes, wherein the service is used as a label; and replacing the text with a numerical value, and carrying out numerical value normalization and label unique hot coding operation.

Data parameter examples after data preprocessing:

duration	src_bytes	dst_bytes	labels
				-2.302585092994046	10.906691489914584	9.025708147644988	1

after data preprocessing, self-adaptive iteration is carried out to obtain a natural neighbor set of each data.

Defining NaN (x) as a natural neighbor set of data x; RNN (x) is an inverse neighbor set of data x, which includes data having x as a neighbor; NN (x) is a neighbor set of data.

In this embodiment, the step of generating the natural neighbor set is as follows:

(1) Initializing parameters, and constructing a KD tree for the data set;

(2) Traversing the data set in the KD tree, searching k neighbors of each data, putting the k neighbors into a corresponding neighbor set NN, and updating an inverse neighbor set RNN of the data regarded as the k neighbors, wherein k is a positive integer with an initial value of 1;

And after a natural adjacent set of each data is obtained, the self-adaptive bandwidth coefficient and the weight of each data are calculated according to the natural adjacent set. The method comprises the following specific steps:

for a data object p whose natural neighbor set is NaN (p), then p is adaptive to the bandwidth coefficient h _p The calculation method is as follows: h is a total of _p H × dist (p, q), where h is a fixed bandwidth coefficient and dist is a distance function, and in this embodiment, it is preferable, but not limited to, to use euclidean distance; the neighbor farthest from p in the natural neighbor set with q as p can be obtained immediately through the solved natural neighbor set; as can be seen from the definition of kernel density, if the region where the data object p is located is denser, the value of dist (p, q) is smaller, the obtained adaptive bandwidth coefficient is smaller, the value of kernel density estimation is larger, and vice versa.

The adaptive weight (p) of data object p is calculated as

Where NaN (p) | is the number of natural neighbors of data object pThe cost (p, x) is the cost that the data p can reach the data x, that is, the adaptive weight of the data object p is the average cost that the data p and the data in the natural adjacent set can reach each other; the calculation formula of the cost function is cost (p, x) = min (r) { r | x ∈ NaN _r (p)∧p∈NaN _r (x)}，NaN _r (p) refers to data in the natural neighbor set of data object p that is r-th nearest to data object p, naN _r (x) Refers to data that is in the natural neighborhood of data object x, near the r-th of data object x.

As can be seen from the calculation of the adaptive weights, if a data object p is in a sparse region, the cost that p and the data in its natural neighbor set can reach each other is large, and vice versa.

And after the self-adaptive bandwidth coefficient and the weight of each data are obtained, calculating the self-adaptive weighted kernel density, the outlier, the upper bound of the outlier and/or the threshold of the outlier of each data according to the self-adaptive bandwidth coefficient and the weight in different application scenes.

For a data object p, the adaptive weighted kernel density AKDE (p) is calculated by the formula:

wherein weight (p) is the adaptive weight of the data object p, and the larger the value of weight (p) is, the smaller the value of the adaptive weighted kernel density is; KDE (p) is the kernel density estimation of the data object, and the calculation formula is as follows:

wherein | NaN (p) | is the number of data in the natural neighbor set of the data object p, d is the dimensionality of the data object p, i.e. the attributes of the data, which is determined according to the data in the acquired data set, h _p Is the adaptive bandwidth factor of data object p. Data object q is the nearest neighbor in the naturally contiguous set of data object p that is farthest from data object p, i.e., data object q is in the naturally contiguous set of data object p, and the distance between q and p is farthest compared to the distance between data object p and other data in its naturally contiguous set.

Data pairThe formula for the outliers KOF (p) like p is:

wherein | NaN (p) | is the number of data in the natural neighbor set of the data object p, and AKDE (p) is the adaptive weighted kernel density of the data object, and it can be known from the calculation formula that if the data object p is an abnormal object, its KOF value is larger.

The upper bound of outliers for data object p is calculated as follows:

first, the upper and lower bounds of the adaptive weighted kernel density of the data object p are calculated according to the nearest and farthest neighbors in the natural neighbor set of the data object p. Because the natural neighbor set is obtained in a way that the distance is from small to large when the natural neighbor set is obtained, the nearest neighbor and the farthest neighbor of the data p can be obtained within O (1) time complexity;

the upper bound of the adaptive weighted kernel density is AKDE _max (p)：

Where data object o is the data closest to p in the natural neighbor set of data object p.

Adaptive weighted kernel density lower bound AKDE _min (p) is:

where data object q is the data in the natural neighbor set of data object p that is farthest from p.

The upper outlier bound UBKOF (p) of the data object p can be calculated from the upper and lower bounds of the adaptive weighted kernel density of the data object p in the following manner:

where NaN (p) is the number of data in the natural neighbor set of data object p, AKDE _min (p) AKDE, the lower bound of the adaptively weighted kernel density for data object p _max (x) Adaptive weighting of data x in a natural neighbor set for a data object pUpper bound on nuclear density.

The outlier threshold is calculated as follows:

firstly, the calculated outliers are sorted according to a non-decreasing order, and the change rate KOF of the outliers is calculated in the following way _var(i,j) ：

Where i, j is the subscript of two adjacent data objects; calculating an outlier threshold KOF based on the calculated outlier rate of change _threshold The formula is as follows: KOF _threshold ＝mean(KOF _var )+ω*std(KOF _var ) Wherein mean (KOF) _var ) Mean value of the degree of change of the degree of outliers, std (KOF) _var ) Omega is a regulation coefficient which is the standard deviation of the degree of change of the degree of departure and the value range is [0,3 ]]The value of ω is preferably 2.5, so ω =2.5 is preferred in the present embodiment.

As can be seen from fig. 3, the obtained outlier threshold can accurately distinguish between normal data and abnormal data in the data set.

And finally, outputting the n data with the maximum degree of outlier or all the data larger than the threshold value of the degree of outlier, thereby extracting the outlier.

The following description will take specific application scenarios as examples.

the top-n problem: when the first n pieces of data with the largest outliers need to be acquired, the n pieces of data may include normal data and abnormal data, that is, the scene specifies that the first n pieces of data with the largest outliers are acquired, and the data are cut quickly by using the upper outlier bound.

The algorithm is as follows:

(1) Randomly selecting n data, calculating the outliers of the n data, and constructing a minimum heap according to the outliers of the n data, wherein the heap top outlier is assumed to be KOF (top), and the heap top outlier is the minimum of heaps.

(2) Traversing the remaining data in the dataset:

for a data object p, the bandwidth coefficient h is adapted according to the data object p _p Adaptive weight (p), nearest neighbor and nearest neighbor in natural neighbor set NaN (p)Calculating the upper outlier UBKOF (p) of p by the far neighbor, and if the UBKOF (p) is smaller than the KOF (top), continuing to execute the step (2); otherwise, executing the step (3); after the data traversal is finished, executing the step (5);

(3) Calculating the degree of outlier KOF (p) of p, if KOF (p) is smaller than KOF (top), performing step (2); otherwise, executing step (4).

(4) Popping the heap top element, putting the value of KOF (p) into the heap, and updating KOF (top);

(5) And outputting data corresponding to the n outliers in the heap.

As shown in fig. 4, the first 43 pieces of data with the largest outliers output by the data set for the top-n problem in this embodiment can be obtained accurately and quickly by using the upper bound of the outliers, as can be seen by comparing fig. 2 and fig. 4.

The abnormal data problem is automatically extracted, the abnormal data needs to be automatically identified in the application scene, and the algorithm is as follows:

(1) Traverse all data in the dataset:

for a data object p, the bandwidth coefficient h is adapted according to the data object p _p Calculating the self-adaptive weighted kernel density AKDE (p) of all data objects in the self-adaptive weight (p) and the natural neighbor set NaN (p), and then calculating the outlier KOF (p) according to the NaN (p);

(2) Calculating an outlier threshold KOF _threshold And traversing all the data in the data set again, and marking the data with the degree of outlier larger than the threshold value of the degree of outlier as abnormal data.

Fig. 3 shows the degree of outlier of all data in the entire exemplary data set and the degree of outlier threshold obtained by the statistical learning method, and it can be seen from fig. 3 that the obtained degree of outlier threshold can accurately distinguish between normal data and abnormal data in the data set.

The invention applies an outlier upper bound which can be obtained in O (1) time complexity aiming at the top-n problem, thereby accelerating the calculation; on the other hand, by using a statistical method, it is possible to obtain abnormal data without determining the number of abnormal data.

The present application further provides an embodiment of a computer storage medium, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform an operation corresponding to the above network connection anomaly identification method based on natural neighbor adaptive weighted kernel density.

In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A network connection abnormity identification method based on natural adjacent self-adaptive weighted kernel density is characterized by comprising the following steps:

carrying out data preprocessing on the network connection recording parameters;

the generation steps of the natural adjacent set of each data are as follows:

(1) Constructing a KD tree for the preprocessed data set;

(3) If the reverse neighbor set of the data set is empty or the number of data of which the reverse neighbor set is empty in two adjacent iterations is changed, adding 1 to the k value and executing the step (2);

(4) Solving the intersection of each data neighbor set NN and the reverse neighbor set RNN, wherein the intersection is a natural neighbor set NaN of each data;

specifically, the adaptive bandwidth coefficient calculation formula of the data object p is h _p H _ dist (p, q), where h is a fixed bandwidth coefficient, dist is a distance function, and data object q is the nearest neighbor in the natural neighbor set of data object p that is farthest from data object p;

the method for calculating the self-adaptive weight of the data object p comprises the following steps: calculating the data object p as the cost (p, x) of the data x which can reach each other, wherein the cost (p, x) = min (r) { r | x ∈ NaN _r (p)∧p∈NaN _r (x) H, where data x is any one of a set of natural neighbors NaN (p) of data object p, naN _r (p) refers to the data r-th nearest to data object p in the natural neighbor set of data object p, naN _r (x) Refers to the data in the natural neighborhood set of data object x that is closer to the r-th data object x;

calculating the average cost of the data object p and all the data in the natural neighbor set NaN (p) which can reach each other to obtain the self-adaptive weight (p) of the data object p;

specifically, the adaptive weighted kernel density AKDE (p) of the data object p is calculated by the formula:

where weight (p) is the adaptation of data object pAnd (3) weighting, wherein KDE (p) is the kernel density estimation of the data object, and the calculation formula is as follows:

where | NaN (p) | is the number of data in the natural neighbor set of data object p, d is the dimensionality of data object p, h _p A self-adaptive bandwidth coefficient of the data object p is obtained, dist is a distance function, and the data object q is a neighbor farthest from the data object p in a natural neighbor set of the data object p;

the formula for the calculation of the degree of outlier KOF (p) of the data object p is:

wherein | NaN (p) | is the number of data in the natural neighbor set of the data object p, AKDE (p) is the adaptive weighted kernel density of the data object, and AKDE (q) is the adaptive weighted kernel density of the data object;

the outlier threshold calculation steps are as follows:

firstly, the calculated outliers are sorted according to non-decreasing order, and the change rate KOF of the outliers is calculated _var(i,j) ：

Where i, j is the subscript of two adjacent data objects;

calculating an outlier threshold KOF based on the calculated outlier rate _threshold The formula is as follows: KOF _threshold ＝mean(KOF _var )+ω*std(KOF _var ) Wherein mean (KOF) _var ) Mean value of the degree of change of the degree of outliers, std (KOF) _var ) Is the standard deviation of the rate of change of the degree of outliers, omega is the adjustment coefficient;

calculating an adaptive weighted kernel density upper bound AKDE for data object p _max (p)：

Where data object o is a data pair in a natural neighbor set of data object pLike p closest data;

calculating an adaptive weighted kernel density lower bound AKDE for data object p _min (p)：

calculate the upper outlier UBKOF (p) of data object p:

where NaN (p) is the natural neighbor set of data object p, | NaN (p) | is the number of data in the natural neighbor set of data object p, AKDE _min (p) AKDE, the lower bound of the adaptively weighted kernel density for data object p _max (x) KOF (p) is the outlier of the data object p as the upper bound of the adaptive weighted kernel density of data x in the natural neighbor set of data object p;

2. The method for identifying network connection abnormality based on natural neighbor adaptive weighted kernel density as claimed in claim 1, wherein the step of selecting n pieces of data with the largest degree of outlier among the network connection recording parameters comprises:

(1) Randomly selecting n data, and constructing a minimum heap according to the outliers of the n data, wherein the top outlier of the heap is KOF (top);

(2) Traversing the remaining data in the dataset:

(3) Calculating an outlier KOF (p) of the data object p, and if KOF (p) is less than KOF (top), performing step (2); otherwise, executing the step (4);

(5) And outputting data corresponding to the n outliers in the heap.

3. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for identifying network connectivity anomalies based on adaptive weighted kernel density of natural neighbors of any one of claims 1-2.