CN117421684B - Abnormal data monitoring and analyzing method based on data mining and neural network - Google Patents
Abnormal data monitoring and analyzing method based on data mining and neural network Download PDFInfo
- Publication number
- CN117421684B CN117421684B CN202311718358.3A CN202311718358A CN117421684B CN 117421684 B CN117421684 B CN 117421684B CN 202311718358 A CN202311718358 A CN 202311718358A CN 117421684 B CN117421684 B CN 117421684B
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- neural network
- model
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 228
- 238000000034 method Methods 0.000 title claims abstract description 122
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 29
- 238000012544 monitoring process Methods 0.000 title claims abstract description 26
- 238000007418 data mining Methods 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 claims abstract description 92
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 78
- 238000004891 communication Methods 0.000 claims abstract description 75
- 238000012545 processing Methods 0.000 claims abstract description 45
- 230000003044 adaptive effect Effects 0.000 claims abstract description 34
- 238000007637 random forest analysis Methods 0.000 claims abstract description 32
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000003062 neural network model Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 58
- 230000008859 change Effects 0.000 claims description 45
- 230000005856 abnormality Effects 0.000 claims description 41
- 238000013527 convolutional neural network Methods 0.000 claims description 39
- 238000004364 calculation method Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 25
- 238000011176 pooling Methods 0.000 claims description 24
- 238000009826 distribution Methods 0.000 claims description 20
- 238000003066 decision tree Methods 0.000 claims description 19
- 230000000694 effects Effects 0.000 claims description 17
- 210000002569 neuron Anatomy 0.000 claims description 16
- 230000006399 behavior Effects 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 5
- 125000004122 cyclic group Chemical group 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 210000004205 output neuron Anatomy 0.000 claims description 4
- 230000035945 sensitivity Effects 0.000 claims description 4
- 238000012847 principal component analysis method Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 2
- 206010000117 Abnormal behaviour Diseases 0.000 abstract 1
- 238000005265 energy consumption Methods 0.000 abstract 1
- 238000004140 cleaning Methods 0.000 description 16
- 230000009467 reduction Effects 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 8
- 238000000513 principal component analysis Methods 0.000 description 8
- 230000002596 correlated effect Effects 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000010354 integration Effects 0.000 description 6
- 230000007423 decrease Effects 0.000 description 4
- 230000001276 controlling effect Effects 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 238000013450 outlier detection Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000007794 visualization technique Methods 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011985 exploratory data analysis Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 238000011031 large-scale manufacturing process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/26—Discovering frequent patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to the technical field of abnormal data monitoring, in particular to a method for monitoring and analyzing abnormal data based on data mining and a neural network, which comprises the following steps: the method comprises the steps of preprocessing real-time communication data, encoding the data, detecting and identifying the abnormal communication data by using an abnormal data detection method, carrying out standardized processing on the abnormal communication data, accurately identifying and controlling abnormal behaviors of equipment by using the abnormal detection of an enhanced weighted random forest algorithm, identifying the classified abnormal data by using an enhanced stream type abnormal detection algorithm and a trained neural network model, detecting and early warning the abnormal data in real time by using an adaptive method, and adjusting by using an adaptive learning rate according to an adaptive gradient adjustment factor. The method has the advantages of good effectiveness and accuracy, high efficiency, low energy consumption, high intelligence and the like.
Description
Technical Field
The invention relates to the technical field of abnormal data monitoring, in particular to a method for monitoring and analyzing abnormal data based on data mining and a neural network.
Background
Anomaly detection is a hot topic in various fields at present, and is widely applied to the fields of health care, intelligent transportation, large-scale production systems, network security and the like. Abnormality detection targets in different fields are different, for example, in the field of health care, abnormality detection is used to monitor human diseases; in intelligent traffic, it is used to find traffic accidents; in large production systems, for equipment failure diagnosis; in network security, it is used to detect network intrusion and the like. In the field of mobile communication systems, common abnormal data include signal quality anomalies, dropped call rate anomalies, call completion rate anomalies, data transmission rate anomalies, base station failure or anomalies, and traffic anomalies.
The existing anomaly detection method comprises clustering, random forest, single-class support vector machine and the like, training is carried out on data by using a machine learning algorithm, and then the anomaly data is detected through a model. And judging whether the abnormality exists or not by analyzing the statistical characteristics of the data. The existing abnormality detection method also comprises ARIMA model, exponential smoothing and the like, wherein data are regarded as time series, and whether abnormality exists or not is judged by analyzing the trend and periodicity of the series. The current communication system is mainly focused on providing high-speed, stable and safe communication connection, meeting the requirements of different users and diversified communication modes of application scenes, and has good expandability and interoperability. Meanwhile, requirements on mobility, low delay, high reliability, privacy protection and the like are also increasing. Since existing anomaly detection methods fail to meet the needs of current communication systems, these anomaly data are monitored and analyzed in conjunction with data mining and neural networks.
In the prior art, little research is done on abnormal data monitoring by using a neural network and data mining, and chinese patent application No. 201811522835.8 discloses a mobile communication data traffic abnormality monitoring system, which includes a traffic monitoring unit, an information following unit, a traffic analyzing unit, a personal database, a processor, a display unit, an event recording unit, a reminding unit and a data confirmation unit. The flow monitoring unit is used for monitoring communication flow of the communication equipment, wherein the communication flow comprises data flow and real-time rate information. The data flow represents the total consumption of the data flow until the current month, and the real-time rate information represents the real-time network speed when the network access is performed. According to the scheme, the data traffic service condition of the mobile equipment is monitored in real time through the traffic monitoring unit, and the communication traffic is transmitted to the traffic analysis unit. The flow analysis unit is combined with the abnormality analysis step to calculate and obtain an instant stable value. And then, calculating a stable difference value according to the instant stable value, and judging whether the data flow access of the user is abnormal or not by utilizing the difference value. But normal traffic typically occupies the vast majority and abnormal traffic is relatively small, which causes tag imbalance problems.
Most of the existing anomaly detection methods adopt neural network training anomaly detection models, and Chinese patent application number 202110397166.1 discloses an anomaly data detection method based on improved EMD and neural network models, which comprises the following steps: drawing an envelope curve on the original signal by using an envelope function; inputting the drawn envelope signals into a modified EMD algorithm to extract characteristic variables (IMF components); modifying the characteristic variable extraction flow, namely modifying a cubic spline interpolation function into an fminbnd function, wherein an envelope function adopts an inventcope function; inputting the extracted characteristic variables into a neural network model; after three layers of screening of the neural network model and matching with the frequency spectrum of the fault cause, finding out the fault point and the fault cause; however, the method has the phenomenon of modal aliasing, noise interferes with the sampled signal, and the characteristic variable of the signal cannot be accurately extracted.
The invention aims to solve the technical problems of low analysis speed and inaccurate analysis result in the existing communication abnormal data detection.
For this purpose, abnormal data monitoring and analysis methods based on data mining and neural networks are proposed.
Disclosure of Invention
The invention aims to provide an abnormal data monitoring and analyzing method based on data mining and a neural network, which is characterized in that real-time communication data is preprocessed, then data encoding is carried out, abnormal communication data is detected and identified by using an abnormal data detection method, and real-time communication data trend is predicted and early-warned by using a data mining and neural network technology. And identifying the abnormality of the standardized data by using a neural network related algorithm, and realizing intelligent early warning of communication abnormal data.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the abnormal data monitoring and analyzing method based on the data mining and the neural network comprises the following steps:
acquiring a high-dimensional real-time communication data set of a user side, and classifying and marking the high-dimensional real-time communication data set;
preprocessing the classified high-dimensional real-time communication data set;
extracting features of the preprocessed high-dimensional real-time communication data set by using a bi-directional cyclic neural network BiRNN model;
performing dimension reduction on the extracted features of the high-dimension real-time communication data set by using a Principal Component Analysis (PCA) method, and encoding the dimension-reduced features by using a discrete encoding method to form encoded data;
Classifying the encoded data into a normal data set and an abnormal data set by an enhanced weighted random forest algorithm;
inputting the abnormal data set into a convolutional neural network detection model for training;
calculating a local anomaly factor value by comparing the average density of each data point with the adjacent neighbor points, the data points with the local anomaly factor value less than a certain threshold being anomaly points;
periodically updating the average density of the neighborhood set, and updating the local anomaly factor value of the data points in the neighborhood according to the updated average density;
processing the abnormal data set in real time by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, dynamically adjusting according to the environment and the data distribution of the data set change, and automatically identifying the abnormal type of the abnormal data;
outputting results and early warning, and detecting the abnormal data set in real time by a self-adaptive method; judging whether the data sample is abnormal or not according to the threshold value, performing corresponding early warning processing, and feeding early warning information back to an early warning system;
the self-adaptive method tracks the change of the data in real time, dynamically adjusts the model and the parameters, can be more suitable for different types and distribution of data, provides more accurate abnormality detection and early warning results, and ensures that the early warning accuracy reaches more than 90 percent.
Preferably, the data preprocessing specifically includes:
unique attribute processing: the unique attribute refers to the characteristic capable of uniquely identifying the sample to be identified, but has no influence on the identification of abnormal data, and is directly deleted;
missing data processing: finding the position of the missing value; filling the missing value; quickly acquiring index information of a missing value in the characteristic data, and processing the missing value by using a position logic index;
correlation attribute merging: for characteristic data with obvious correlation, combining the correlated data into one data by using a data operation, and deleting non-correlated data; the data is combined by adding two attribute columns feature 1 and feature 2 into the data, and the deleting operation is performed after the attribute combination.
Automatic cleaning: automatically identifying and processing the problems of missing values, abnormal values and repeated values in the data by using an automatic algorithm and a tool through a semi-supervised learning algorithm and an abnormal detection algorithm; the method comprises the following specific steps:
loading data to be cleaned into an appropriate computing environment;
missing value processing: and (3) missing value detection: detecting missing values in the data using a statistical index, a visualization method, or a clustering algorithm; missing value filling: filling the missing value by using an interpolation method according to the characteristics and meaning of the data;
Outlier processing: abnormal value detection: identifying outliers in the data using an outlier detection algorithm; outlier processing: selecting an abnormal value deleting method for processing according to the property of the abnormal value;
repeating the value processing: duplicate value detection: detecting duplicate records in the data using a unique identifier of the data or a combination of fields; repeating the value processing: selecting to reserve the first record to process the repeated value according to the service requirement;
verification and evaluation: verifying and evaluating the cleaned data, checking the cleaning effect, and comparing the cleaning effect with the original data; the missing value ratio, the outlier ratio, are used to evaluate the accuracy and integrity of the cleaning results.
Preferably, the encoding the dimension reduction feature by using a discrete encoding method to form encoded data specifically includes:
among the relevant attributes, one or more discrete attributes are categorized into One category, and each category is individually encoded using the one_hot encoding method.
Preferably, the training process of the convolutional neural network detection model comprises the following steps:
initializing parameters with random values;
inputting the coded data in the abnormal data set, and obtaining an output value through forward propagation of a convolution layer, a pooling layer and a full connection layer;
Calculating training errors between the output value and the target value of each layer of the convolution layer, the pooling layer and the full-connection layer; the target value refers to the classified abnormal data set;
performing back propagation updating weight according to the training error;
when the training error does not change significantly within 100 iterations, the training process is terminated; if the termination condition is not met, re-executing the input data;
and obtaining a trained convolutional neural network model.
Preferably, the number of neurons in the input layer of the convolutional neural network is set according to the number of the input abnormal communication data features and the coding bit number of each abnormal communication data feature, and the calculation method is as follows:
;
where M is the number of features of the input abnormal communication data,is the number of coding bits of the ith feature, +.>Is the number of neurons of the input layer; the abnormal communication data characteristic types comprise frequency abnormality, time delay abnormality, abnormal data packet frequency, abnormal data packet size, signal strength abnormality, abnormal protocol behavior and data integrity abnormality.
Preferably, the encoded data in the abnormal data set is input into a convolutional neural network detection model M1 for training;
the convolutional neural network model M1 comprises 12 convolutional layers and 8 pooling layers, the convolutional layers adopt 3 small convolutional kernels of 3*3, and the 8 pooling layers adopt maximum pooling;
Respectively adding residual error connection modules into a 3 rd convolution layer, a 4 th convolution layer and a 5 th convolution layer in the model M1, wherein one residual error connection module consists of two convolution layers, and batch normalization and activation functions are contained between the convolution layers; adding the output and the input of the 3 rd convolution layer of the model M1, transmitting the added result to the 4 th convolution layer as input, applying the residual error connection module again, adding the output and the input of the 4 th convolution layer, transmitting the added result to the 5 th convolution layer, continuously applying the residual error connection module, and obtaining a convolution neural network model M2;
inputting the encoded data in the abnormal data set into the convolutional neural network model M2, and performing convolutional, residual error connection and pooling operation;
the middle layer is 1 layer, and the neuron number of the middle layer is。
Preferably, the number of neurons of the output layer is equal to the number of demand categories; the one_hot encoding method is used to encode the demand state and the function of the output neuron is selected as the log function.
Preferably, classification is performed by an enhanced weighted random forest algorithm, comprising the steps of:
inputting the encoded data into an enhanced weighted random forest model for training;
Performing anomaly detection and classification using a trained, enhanced weighted random forest model, separating the encoded data into a normal data set and an anomaly data set;
assigning a weight to each abnormal data sample;
for unlabeled new samples, the trained model uses the learned parameters and weights thereof to conduct classification inference according to the learned rules, and the new samples are distributed to normal categories or abnormal categories; the parameters and weights of the model are learned from the labeled samples by an optimization algorithm in the training process.
Preferably, the tag classification of the data refers to dividing all data sets into normal data sets and abnormal data sets, the encoded data is classified by an enhanced weighted random forest algorithm, and the weight function of the enhanced weighted random forest algorithm is as follows:
;
wherein,is the unbalance of the decision tree, N is the number of decision trees, < >>Is the voting weight of the decision tree;
given N balanced sub-training sets, training is carried out on the N balanced sub-training sets to obtain N decision tree classifiers,/>Is a natural number from 1 to N;
the final classifier is obtained by weighted voting and is expressed as follows:
;
Wherein Y is an abnormal data set;
the finally obtained classifierThe method is used for testing the classification effect by the test set.
Preferably, the abnormal data set is identified by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, and the calculation process of the local abnormal factor in the enhanced flow type abnormal detection algorithm is as follows:
for each outlier data point in the real-time communication dataset:
Calculation ofK nearest neighbor of (2) and obtain the neighborhood set +.>;
Calculating data pointsTo data point->K reachable distance +.>Which is data point>K adjacent distance>And (4) point->And->European distance between->Is the maximum value of (2);
;
according to the set distance thresholdDefining the data points with the distance smaller than or equal to the threshold value as the data points in the neighborhood of the target point; target point->Is expressed as:
;
wherein,representing target point->A set of data points within a neighborhood of (a); d is the dataset; />Representing data points +.>Is->A Euclidean distance between them; />Indicating that the distance is less than or equal to a set threshold +.>Data point->Belonging to the target point->Is within a neighborhood of (2); different k reachable distances correspond to different distance thresholds, and the threshold is dynamically adjusted according to the k reachable distances >The method comprises the steps of carrying out a first treatment on the surface of the Data point +.>To data point->K reachable distance +.>And threshold->Comparing; if the k reachable distance of a certain data point is smaller than the threshold +.>Marking it as an anomaly of the same type; if successive outliers occur, the threshold value +.>To improve accuracy; if no outlier occurs, the threshold value +.>To increase sensitivity;
calculation and calculationNeighborhood set of->Is expressed as> ;
;
;
Calculation ofIs->;
Wherein M is the number of data points; by dynamically adjusting threshold valuesSo that abnormal data points in more adjacent areas are detected.
Preferably, the self-adaptive method is utilized to enable the abnormality detection model to be automatically adjusted according to the change of the data, so as to adapt to the new data distribution and mode change; in the adaptive methodThe updated formula of (c) is as follows:
;
wherein,is a parameter->Is (are) updated value->Is self-adaptive learning rate->Is an adaptive gradient adjustment factor,/->Is the first moment of the gradient, i.e. the mean,/->Is the second moment of the gradient, i.e. the variance, +.>Is a smooth item->Is the current gradient;
the learning rate is adjusted according to the self-adaptive gradient adjustment factor, and the parameters of the model are updated by using the adjusted learning rate; when the adaptive gradient adjustment factor becomes large, the learning rate becomes small; when the self-adaptive gradient adjustment factor becomes smaller, the learning rate becomes larger; the adaptive gradient adjustment factor affects not only the absolute magnitude of the learning rate, but also the rate of change of the learning rate; the change amplitude of the learning rate in the training process is adjusted by controlling the change rate of the self-adaptive gradient adjustment factor; an adaptive gradient adjustment factor greater than a certain threshold may result in a slower change in learning rate, while an adaptive gradient adjustment factor less than a certain threshold may result in a faster change in learning rate.
Compared with the prior art, the invention has the beneficial effects that:
1. the abnormal data points are effectively identified by classifying through an abnormal detection algorithm of the reinforced weighted random forest algorithm, the classification accuracy is improved, the characteristics of the abnormal data are better captured, a large-scale high-dimensional data set is effectively processed, and the real-time abnormal detection requirement is met. The enhanced weighted random forest algorithm allows the model to better handle sample size differences between different classes. By giving different weights to the data samples, the model is more concerned about the samples of few categories, thereby improving the classification accuracy. The weighted random forest algorithm considers the importance of the features and identifies the most discriminative features for classification or anomaly detection tasks by evaluating the weights of the features. This helps to improve the accuracy and efficiency of the model and reduces the impact of irrelevant or redundant features. The random forest algorithm has strong robustness, and has certain robustness on noise and abnormal values in data. The stability of the model is further improved through the improvement of the weighted random forest algorithm, and the problem of over-fitting of abnormal samples is reduced, so that the overall model performance is improved.
2. And predicting and early warning the trend of the real-time communication data by utilizing an enhanced stream type anomaly detection algorithm, processing the high-dimensional data stream in real time, immediately identifying the anomaly data, continuously updating a model according to new data, and adaptively adapting to the change of data distribution. The reinforced flow type anomaly detection algorithm has self-adaptability, and automatically learns and adjusts the model to adapt to the change of data distribution. Different k reachable distances correspond to different distance thresholds, and the threshold is dynamically adjusted according to the k reachable distancesThis improves the robustness and adaptability of the algorithm, ensuring that the performance of the model remains efficient during long-term operation. The streaming anomaly detection algorithm uses incremental learning to model update new data samples without the need to reprocess the entire data set. Thus, the calculation resources are saved, and the algorithm efficiency is improved. By dynamically adjusting threshold->So that abnormal data points in more adjacent areas are detected.
3. The abnormal data is detected and early-warned through a self-adaptive method, and a detection model or rule can be automatically adjusted according to the actual condition of the data so as to adapt to different data distribution and abnormal modes. The method ensures that the learning rate is automatically adapted to different gradient conditions in the training process, thereby achieving better optimization effect; the method can adapt to dynamic change of data and new abnormal modes, and improves robustness and adaptability of the system. And an abnormality detection algorithm is dynamically adjusted according to the distribution and the change of the actual data by using an adaptive method, so that the detection accuracy is improved. The self-adaptive method can capture the change modes and characteristics of the data and correspondingly adjust the abnormality detection model so as to better identify the abnormal data. The self-adaptive method can analyze and process the data in real time, timely detect abnormal data and make early warning.
Drawings
FIG. 1 is a flow chart of a method for monitoring and analyzing abnormal data based on data mining and neural network according to the present invention;
FIG. 2 is a graph of classification effects of various forms of models of the present invention;
FIG. 3 is a graph comparing weight functions of the present invention;
FIG. 4 is a graph comparing local anomaly factors of the present invention;
FIG. 5 is a graph comparing adaptive learning rates of the adaptive method of the present invention;
fig. 6 is a graph comparing early warning accuracy of the adaptive method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described herein are merely some, but not all embodiments of the invention. All other embodiments, which come within the scope of the invention without inventive work, are within the scope of the invention.
Referring to fig. 1 to 6, the present invention provides an abnormal data monitoring and analyzing method based on data mining and neural network, and the technical scheme is as follows:
the abnormal data monitoring and analyzing method based on the data mining and the neural network comprises the following steps:
Acquiring a high-dimensional real-time communication data set of a user side, and carrying out marking classification on the high-dimensional real-time communication data set; the high-dimensional dataset includes the following 10 dimensional data: timestamp, sender and receiver identification, communication mode, communication duration, communication quality indicator, bandwidth usage, communication location, data traffic, network topology, and user behavior; removing users without abnormal communication behaviors, and retaining abnormal data samples; the abnormal data comprise abnormal signal quality, abnormal call drop rate, abnormal call completing rate, abnormal data transmission rate, base station fault or abnormal state and abnormal flow;
preprocessing the classified high-dimensional real-time communication data set; the data preprocessing comprises automatic cleaning, multi-source data integration, abnormality detection and data restoration, unique attribute deletion, relevant attribute integration and missing value processing, and the acquired high-dimensional data is converted into a data set which does not contain missing data and only contains effective characteristics;
the data preprocessing specifically comprises the following steps:
unique attribute processing: the unique attribute refers to the characteristic capable of uniquely identifying the sample to be identified, but has no influence on the identification of abnormal data, and is directly deleted;
Missing data processing: finding the position of the missing value; filling the missing value; quickly acquiring index information of a missing value in the characteristic data, and processing the missing value by using a position logic index;
correlation attribute merging: for characteristic data with obvious correlation, combining the correlated data into one data by using a data operation, and deleting non-correlated data; the data is combined by adding two attribute columns feature 1 and feature 2 into the data, and the deleting operation is performed after the attribute combination.
Automatic cleaning: automatically identifying and processing the problems of missing values, abnormal values and repeated values in the data by using an automatic algorithm and a tool through a semi-supervised learning algorithm and an abnormal detection algorithm; the method comprises the following specific steps:
loading data to be cleaned into an appropriate computing environment;
missing value processing: and (3) missing value detection: detecting missing values in the data using a statistical index, a visualization method, or a clustering algorithm; missing value filling: filling the missing value by using an interpolation method according to the characteristics and meaning of the data;
outlier processing: abnormal value detection: identifying outliers in the data using an outlier detection algorithm; outlier processing: selecting an abnormal value deleting method for processing according to the property of the abnormal value;
Repeating the value processing: duplicate value detection: detecting duplicate records in the data using a unique identifier of the data or a combination of fields; repeating the value processing: selecting to reserve the first record to process the repeated value according to the service requirement;
verification and evaluation: verifying and evaluating the cleaned data, checking the cleaning effect, and comparing the cleaning effect with the original data; the missing value ratio, the outlier ratio, are used to evaluate the accuracy and integrity of the cleaning results.
Extracting features of the preprocessed high-dimensional real-time communication data set by using a bi-directional cyclic neural network BiRNN model, extracting features suitable for anomaly detection, removing data which cannot possibly have anomalies from the data, and extracting the anomaly data;
normalizing the abnormal data, performing dimension reduction on the extracted characteristics of the high-dimension real-time communication data set by using a Principal Component Analysis (PCA) method, and encoding the dimension reduction characteristics by using a discrete encoding method to form encoded data;
the step of encoding the dimension reduction feature by using a discrete encoding method to form encoded data specifically comprises the following steps:
among the relevant attributes, one or more discrete attributes are categorized into One category, and each category is individually encoded using the one_hot encoding method.
Classifying the encoded data into a normal data set and an abnormal data set by an enhanced weighted random forest algorithm;
assigning a weight to each abnormal data sample;
for unlabeled new samples, the trained model uses the learned parameters and weights thereof to conduct classification inference according to the learned rules, and the new samples are distributed to normal categories or abnormal categories; the parameters and weights of the model are learned from the labeled samples by an optimization algorithm in the training process.
The marking classification of the data refers to that all data sets are divided into normal data sets and abnormal data sets, the coded data are classified through an enhanced weighted random forest algorithm, and the weight function of the enhanced weighted random forest algorithm is as follows:
(1);
wherein,is the unbalance of the decision tree, N is the number of decision trees, < >>Is the voting weight of the decision tree;
given N balanced sub-training sets, training is carried out on the N balanced sub-training sets to obtain N decision tree classifiers,/>Is a natural number from 1 to N;
the final classifier is obtained by weighted voting and is expressed as follows:
(2);
wherein Y is an abnormal data set;
the finally obtained classifier The method is used for testing the classification effect by the test set.
Inputting the abnormal data into a convolutional neural network detection model for training;
the training process of the convolutional neural network detection model comprises the following steps:
initializing parameters with random values;
inputting the coded data in the abnormal data set, and obtaining an output value through forward propagation of a convolution layer, a pooling layer and a full connection layer;
calculating training errors between the output value and the target value of each layer of the convolution layer, the pooling layer and the full-connection layer; the target value refers to the classified abnormal data set;
performing back propagation updating weight according to the training error;
when the training error does not change significantly within 100 iterations, the training process is terminated; if the termination condition is not met, re-executing the input data;
and obtaining a trained convolutional neural network model.
The method comprises the steps of setting the number of neurons in an input layer of a convolutional neural network according to the number of input abnormal communication data features and the coding bit number of each abnormal communication data feature, wherein the calculation method is shown in the following formula:
;(3)
where M is the number of features of the input abnormal communication data,is the number of coding bits of the ith feature, +.>Is the number of neurons of the input layer; the abnormal communication data characteristic types comprise frequency abnormality, time delay abnormality, abnormal data packet frequency, abnormal data packet size, signal strength abnormality, abnormal protocol behavior and data integrity abnormality.
And inputting the encoded data in the abnormal data set into a convolutional neural network detection model M1 for training.
The convolutional neural network model M1 comprises 12 convolutional layers and 8 pooling layers, the convolutional layers adopt 3 small convolutional kernels of 3*3, and the 8 pooling layers adopt maximum pooling;
respectively adding residual error connection modules into a 3 rd convolution layer, a 4 th convolution layer and a 5 th convolution layer in the model M1, wherein one residual error connection module consists of two convolution layers, and batch normalization and activation functions are contained between the convolution layers; adding the output and the input of the 3 rd convolution layer of the model M1, transmitting the added result to the 4 th convolution layer as input, applying the residual error connection module again, adding the output and the input of the 4 th convolution layer, transmitting the added result to the 5 th convolution layer, continuously applying the residual error connection module, and obtaining a convolution neural network model M2;
inputting the encoded data in the abnormal data set into the convolutional neural network model M2, and performing convolutional, residual error connection and pooling operation;
the middle layer is 1 layer, and the neuron number of the middle layer is。
The number of neurons of the output layer is equal to the number of demand classifications; the one_hot encoding method is used to encode the demand state and the function of the output neuron is selected as the log function.
Calculating a local anomaly factor value by comparing the average density of each data point with the adjacent neighbor points, the data points with the local anomaly factor value less than a certain threshold being anomaly points;
periodically updating the average density of the neighborhood set, and updating the local anomaly factor value of the data points in the neighborhood according to the updated average density;
processing the abnormal data set in real time by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, dynamically adjusting according to the environment and the data distribution of the data set change, and automatically identifying the abnormal type of the abnormal data;
identifying the abnormal data set by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, wherein the calculation process of local abnormal factors in the enhanced flow type abnormal detection algorithm is as follows:
for each outlier data point in the real-time communication dataset:
Calculation ofK nearest neighbor of (2) and obtain the neighborhood set +.>;
Calculating data pointsTo data point->K reachable distance +.>Which is data point>K adjacent distance>And (4) point->And->European distance between->Is the maximum value of (2);
;
according to the set distance thresholdDefining the data points with the distance smaller than or equal to the threshold value as the data points in the neighborhood of the target point; target point- >Is expressed as:
;
wherein,representing target point->A set of data points within a neighborhood of (a); d is the dataset; />Representing data points +.>Is->A Euclidean distance between them; />Indicating that the distance is less than or equal to a set threshold +.>Data point->Belonging to the target point->Is within a neighborhood of (2); different k reachable distances correspond to different distance thresholds, and the threshold is dynamically adjusted according to the k reachable distances>The method comprises the steps of carrying out a first treatment on the surface of the Data point +.>To data point->K reachable distance +.>And threshold->Comparing; if the k reachable distance of a certain data point is smaller than the threshold +.>Marking it as an anomaly of the same type; if a continuous number of outliers occurs, the threshold value +.>To improve accuracy; if no or few outliers occur, the threshold value +.>To increase sensitivity;
calculation and calculationNeighborhood set of->Is expressed as> ;
;
;(4)
Calculation ofIs->;(5)
Wherein M is the number of data points; by dynamically adjusting threshold valuesSo that abnormal data points in more adjacent areas are detected.
Outputting results and early warning, and detecting the abnormal data set in real time by a self-adaptive method; judging whether the data sample is abnormal or not according to the threshold value, performing corresponding early warning processing, and feeding early warning information back to an early warning system;
The self-adaptive method tracks the change of the data in real time, dynamically adjusts the model and the parameters, can be more suitable for different types and distribution of data, provides more accurate abnormality detection and early warning results, and ensures that the early warning accuracy reaches more than 90 percent.
The self-adaptive method is utilized to enable the anomaly detection model to be automatically adjusted according to the change of the data, and adapt to new data distribution and mode change; in the adaptive methodThe updated formula of (c) is as follows:
;(6)
wherein,is a parameter->Is (are) updated value->Is self-adaptive learning rate->Is an adaptive gradient adjustment factor,/->Is the first moment of the gradient, i.e. the mean,/->Is the second moment of the gradient, i.e. the variance, +.>Is a smooth item->Is the current gradient;
the learning rate is adjusted according to the self-adaptive gradient adjustment factor, and the parameters of the model are updated by using the adjusted learning rate; when the adaptive gradient adjustment factor becomes large, the learning rate becomes small; when the self-adaptive gradient adjustment factor becomes smaller, the learning rate becomes larger; the method ensures that the learning rate is automatically adapted to different gradient conditions in the training process, thereby achieving better optimization effect; the adaptive gradient adjustment factor affects not only the absolute magnitude of the learning rate, but also the rate of change of the learning rate; the change amplitude of the learning rate in the training process is adjusted by controlling the change rate of the self-adaptive gradient adjustment factor; an adaptive gradient adjustment factor greater than a certain threshold may result in a slower change in learning rate, while an adaptive gradient adjustment factor less than a certain threshold may result in a faster change in learning rate.
As an embodiment of the present invention, a mobile communication network operator in a certain area desires to find out problems and potential failure causes by analyzing abnormal data such as signal quality abnormality, call drop rate abnormality, call completion rate abnormality, data transmission rate abnormality, base station failure or abnormal state, and traffic abnormality, so as to take corresponding measures to improve network quality. Various anomaly data including signal quality, call drop rate, call completion rate, data transmission rate, base station status, and traffic data are first collected. The data is preprocessed, including data cleaning, outlier removal, normalization, etc. Exploratory analysis, including statistical description, visual analysis, etc., is then performed on each anomaly data. For example, a line graph or a bar graph is drawn to observe the time-series change condition of abnormal data and the correlation with other indexes. An anomaly detection algorithm (random forest algorithm) is used to identify outlier data points. For each outlier data point, the cause and influencing factors behind it are further analyzed, e.g. looking at the network equipment, base station location, weather conditions etc. related to the outlier. And establishing an abnormality detection model according to the existing data and the characteristics of the known abnormality. At the same time, features such as signal strength, network load, antenna direction, etc. are extracted based on existing knowledge. And determining specific reasons for the abnormality according to the result of the model and the feature importance analysis, and providing corresponding problem solutions. For example, if a base station failure rate in a certain area is found to be high, engineering maintenance or increased investment may be required to improve the stability of the base station apparatus. According to the proposed problem solution, corresponding improvement measures are implemented and the improved data changes are of interest. The improvement effect is monitored and evaluated, and if the results are still unsatisfactory, further optimization of the solution or re-identification of other potential anomalies is required.
As an embodiment of the invention, reference is made to fig. 1, which is a flow chart of the method according to the invention.
Acquiring a high-dimensional real-time communication data set of a user side, and carrying out marking classification on the high-dimensional real-time communication data set; removing users without abnormal communication behaviors, and retaining abnormal data samples;
preprocessing the classified high-dimensional real-time communication data set; the data preprocessing comprises automatic cleaning, multi-source data integration, abnormality detection and data restoration, unique attribute deletion, relevant attribute integration and missing value processing, and the acquired high-dimensional data is converted into a data set which does not contain missing data and only contains effective characteristics;
extracting features of the preprocessed high-dimensional real-time communication data set by using a bi-directional cyclic neural network BiRNN model, extracting features suitable for anomaly detection, removing data which cannot possibly have anomalies from the data, and extracting the anomaly data;
normalizing the abnormal data, performing dimension reduction on the extracted high-dimension real-time communication data set features by using a Principal Component Analysis (PCA) method, and encoding the dimension reduction features by using a discrete encoding method to form encoded data;
Classifying by an anomaly detection algorithm of the enhanced weighted random forest algorithm; dividing the encoded data into a normal data set and an abnormal data set;
inputting the abnormal data into a convolutional neural network detection model for training;
processing the data flow in real time by using an enhanced flow type anomaly detection algorithm and a trained convolutional neural network model, dynamically adjusting according to the continuously changing data distribution, and automatically identifying the anomaly type of the anomaly data;
and outputting and early warning results, detecting and early warning the abnormal data set in real time through a self-adaptive method, and feeding early warning information back to an early warning system.
As an embodiment of the present invention, referring to fig. 2, a classification effect diagram of various form models is shown.
The abnormal communication data is processed using a time sequence construction method and then input into a classifier. After operation, the classification effect of the traditional model and various models is obtained. The results of the operation of each model are shown in fig. 2.
As can be seen from fig. 2, the recall of the time series exponential model in the form of the ratio and the first relative value is highest. The time series exponential model in differential form is best in terms of accuracy. It is also seen from fig. 2 that the recall and accuracy exhibit a one-time fluctuation law. The comparison of classification accuracy under different hidden layer structures is shown in table 1.
As seen from table 1, concealment levels 1, 2, 3 and 4 show good classification accuracy, all reaching 91% or more. After 1000 iterations, the classification accuracy of the second stage reaches 99.22%, which is the maximum of classification accuracy. Therefore, it is concluded that the convolutional neural network model with the 4-layer hidden layer structure has good classification accuracy.
Table 1 comparison of classification accuracy under different hidden layer structures
As can be seen from Table 2, the classification and identification error rate of the classifier designed by the invention is 27.51%, and the accuracy is 72.49%. The values and actual values in table 2 represent the number of times there is no physical unit. Experiments show that the main cause of the error rate is the delay of the communication anomaly detection result. According to the example analysis, the abnormal data monitoring and analyzing method based on the data mining and the neural network has good effectiveness and accuracy and has a certain practical value.
Table 2 error rate of neural network classification
As an embodiment of the present invention, refer to fig. 3, a comparison graph of weight functions.
Assuming 5 decision trees, different weights are given to different samples according to the frequency of the samples in the data set or the importance degree of specific attributes. When (when) Taking 10%>Take 20->Taking 30%>40 parts of (I) in the middle of (II)>Taking 50, N is equal to 5, then +.>,/>,,/>,/>The method comprises the steps of carrying out a first treatment on the surface of the When->50 parts of (I) in the middle of (II)>40 parts of (I) in the middle of (II)>Taking 30%>Take 20->Taking 10, N is equal to 5, then +.>,,/>,/>,/>。
As shown in fig. 3, as the number of decision trees increases,the weight value decreases accordingly. With decreasing unbalance of the decision tree, +.>The weight value increases. The reinforced weighted random forest algorithm performs weighting treatment on the samples by introducing a weight function so as to improve the accuracy and the robustness of the model. By introducing a weight function, the weighted random forest is more concerned with important or scarce samples and effectively handles unbalanced data sets. This improves the accuracy of the classification model over a few classes or important samples and reduces the risk of misclassification. The weighted random forest enables the model to more robustly cope with noise, outliers and outliers in the data by weighting the samples. This helps to reduce the impact of these interference factors on the model and improves the stability of the model. By carrying out weighting treatment on the samples, the weighted random forest better captures important characteristics and modes of sample distribution, so that the generalization capability of the model is improved. This helps reduce the over-fitting phenomenon and improves the predictive power of the model for unknown data.
As an embodiment of the present invention, reference is made to fig. 4, which is a comparison graph of local anomaly factors.
Assuming that for each data point, its 10 nearest neighbors neighborhood is determined, the reachable distance of each data point to each point in its neighborhood is calculated, and then the average of the inverse of the reachable distance is calculated. For each data point, the LOF value for each point in its neighborhood is calculated and then the average of these LOF values is taken as the LOF value for that data point. A threshold is set based on the calculated LOF value, and if the LOF value of the data point exceeds the threshold, it is determined as an outlier.
As shown in fig. 4, as the number of nearest neighbors increases, the LOF value decreases. As the achievable distance increases, the LOF value decreases. The LOF algorithm can effectively find and identify outliers in the flow anomaly detection. The LOF algorithm is able to capture local anomaly patterns, not just global anomaly patterns, by taking into account the density and outlier degree of the data points relative to their neighborhood. Because local neighborhood calculation is adopted, the LOF algorithm is sensitive to the change of data distribution, and a new abnormal mode in the data stream can be captured in time. When the LOF algorithm calculates the average density and the local anomaly factors, some optimization techniques such as approximate calculation and index structure are adopted, so that the calculation complexity is reduced, and the algorithm efficiency is improved.
As an embodiment of the present invention, referring to fig. 5, a comparison chart of adaptive learning rates of an adaptive method is shown.
Assume that1->Taking 0.3-0.7%>Taking 300%>Taking 100.
As shown in fig. 5, with the smoothing termAn increase in the update value; as the learning rate increases, the update value decreases. By adaptive gradient adjustment factor->Offset correction is performed to adaptively adjust the learning rate, and the learning rate is divided by the square root of these to perform parameter updating. Not only can the learning rate be adaptively adjusted, but also the updating amplitude and the dynamic range of different parameters can be adapted, so that the training effect and the training stability are improved. The self-adaptive learning rate method can dynamically adjust the learning rate according to the current situation, thereby improving the efficiency and performance of the algorithm.
As an embodiment of the present invention, referring to fig. 6, a comparison chart of early warning accuracy of the adaptive method is shown.
As shown in fig. 6, as the learning rate increases, the higher the early warning accuracy is, the more than 90%; along with the increase of the self-adaptive gradient adjustment factor, the early warning accuracy is improved. The adaptive learning rate may dynamically adjust the learning rate based on the model's performance during training so that the model can converge to an optimal solution more quickly. The adaptive gradient adjustment factor can dynamically adjust the update amplitude according to the information of the gradient, so that the model parameters can be updated towards the optimal direction more quickly. The proper self-adaptive learning rate and gradient adjustment factor can help the model to better converge to the optimal solution, and the accuracy and stability of the model parameters are improved. When the model parameters are more accurate and stable, the early warning accuracy of the abnormal detection and prediction model can be correspondingly improved. The self-adaptive method can dynamically adjust the model and parameters according to the change and heterogeneity of the data, and can be better adapted to different abnormal conditions. The dynamic adaptability can improve the accuracy of anomaly detection and prediction, thereby improving the early warning accuracy.
In summary, the abnormal data monitoring and analyzing method based on data mining and neural network comprises the following steps:
acquiring a high-dimensional real-time communication data set of a user side, and carrying out marking classification on the high-dimensional real-time communication data set; removing users without abnormal communication behaviors, and retaining abnormal data samples;
preprocessing the classified high-dimensional real-time communication data set; the data preprocessing comprises automatic cleaning, multi-source data integration, abnormality detection and data restoration, unique attribute deletion, relevant attribute integration and missing value processing, and the acquired high-dimensional data is converted into a data set which does not contain missing data and only contains effective characteristics;
the data preprocessing specifically comprises the following steps:
unique attribute processing: the unique attribute refers to the characteristic capable of uniquely identifying the sample to be identified, but has no influence on the identification of abnormal data, and is directly deleted;
missing data processing: finding the position of the missing value; filling the missing value; quickly acquiring index information of a missing value in the characteristic data, and processing the missing value by using a position logic index;
correlation attribute merging: for characteristic data with obvious correlation, combining the correlated data into one data by using a data operation, and deleting non-correlated data; the data is combined by adding two attribute columns feature 1 and feature 2 into the data, and the deleting operation is performed after the attribute combination.
Automatic cleaning: automatically identifying and processing the problems of missing values, abnormal values and repeated values in the data by using an automatic algorithm and a tool through a semi-supervised learning algorithm and an abnormal detection algorithm; the method comprises the following specific steps:
loading data to be cleaned into an appropriate computing environment;
missing value processing: and (3) missing value detection: detecting missing values in the data using a statistical index, a visualization method, or a clustering algorithm; missing value filling: filling the missing value by using an interpolation method according to the characteristics and meaning of the data;
outlier processing: abnormal value detection: identifying outliers in the data using an outlier detection algorithm; outlier processing: selecting an abnormal value deleting method for processing according to the property of the abnormal value;
repeating the value processing: duplicate value detection: detecting duplicate records in the data using a unique identifier of the data or a combination of fields; repeating the value processing: selecting to reserve the first record to process the repeated value according to the service requirement;
verification and evaluation: verifying and evaluating the cleaned data, checking the cleaning effect, and comparing the cleaning effect with the original data; the missing value ratio, the outlier ratio, are used to evaluate the accuracy and integrity of the cleaning results.
Extracting features of the preprocessed high-dimensional real-time communication data set by using a bi-directional cyclic neural network BiRNN model, extracting features suitable for anomaly detection, removing data which cannot possibly have anomalies from the data, and extracting the anomaly data;
normalizing the abnormal data, performing dimension reduction on the extracted characteristics of the high-dimension real-time communication data set by using a Principal Component Analysis (PCA) method, and encoding the dimension reduction characteristics by using a discrete encoding method to form encoded data;
the step of encoding the dimension reduction feature by using a discrete encoding method to form encoded data specifically comprises the following steps:
among the relevant attributes, one or more discrete attributes are categorized into One category, and each category is individually encoded using the one_hot encoding method.
Carrying out unified mark classification on the coded data;
classifying the encoded data into a normal data set and an abnormal data set by an enhanced weighted random forest algorithm;
assigning a weight to each abnormal data sample;
for unlabeled new samples, the trained model uses the learned parameters and weights thereof to conduct classification inference according to the learned rules, and the new samples are distributed to normal categories or abnormal categories; the parameters and weights of the model are learned from the labeled samples by an optimization algorithm in the training process.
The marking classification of the data refers to that all data sets are divided into normal data sets and abnormal data sets, the coded data are classified through an enhanced weighted random forest algorithm, and the weight function of the enhanced weighted random forest algorithm is as follows:
;
wherein,is the unbalance of the decision tree, N is the number of decision trees, < >>Is the voting weight of the decision tree;
given N balanced sub-training sets, training is carried out on the N balanced sub-training sets to obtain N decision tree classifiers,/>Is a natural number from 1 to N;
the final classifier is obtained by weighted voting and is expressed as follows:
;
wherein Y is an abnormal data set;
the finally obtained classifierThe method is used for testing the classification effect by the test set.
The invention classifies through the abnormal detection algorithm of the reinforced weighted random forest algorithm, improves the accuracy of a classification model, better processes the unbalanced data problem and improves the stability and the reliability of the overall classification. The method helps to reduce dimensionality, redundancy characteristics and time consumption of model training and prediction, and is suitable for abnormal detection and classification tasks of large data volume and high-frequency data.
Inputting the abnormal data into a convolutional neural network detection model for training;
The training process of the convolutional neural network detection model comprises the following steps:
initializing parameters with random values;
inputting the coded data in the abnormal data set, and obtaining an output value through forward propagation of a convolution layer, a pooling layer and a full connection layer;
calculating training errors between the output value and the target value of each layer of the convolution layer, the pooling layer and the full-connection layer; the target value refers to the classified abnormal data set;
performing back propagation updating weight according to the training error;
when the training error does not change significantly within 100 iterations, the training process is terminated; if the termination condition is not met, re-executing the input data;
and obtaining a trained convolutional neural network model.
The method comprises the steps of setting the number of neurons in an input layer of a convolutional neural network according to the number of input abnormal communication data features and the coding bit number of each abnormal communication data feature, wherein the calculation method is shown in the following formula:
;(3)
where M is the number of features of the input abnormal communication data,is the number of coding bits of the ith feature, +.>Is the number of neurons of the input layer; the abnormal communication data characteristic types comprise frequency abnormality, time delay abnormality, abnormal data packet frequency, abnormal data packet size, signal strength abnormality, abnormal protocol behavior and data integrity abnormality.
Inputting the abnormal data set into a convolutional neural network detection model M1 for training;
the convolutional neural network model M1 comprises 12 convolutional layers and 8 pooling layers, the convolutional layers adopt 3 small convolutional kernels of 3*3, and the 8 pooling layers adopt maximum pooling;
respectively adding residual error connection modules into a 3 rd convolution layer, a 4 th convolution layer and a 5 th convolution layer in the model M1, wherein one residual error connection module consists of two convolution layers, and batch normalization and activation functions are contained between the convolution layers; adding the output and the input of the 3 rd convolution layer of the model M1, transmitting the added result to the 4 th convolution layer as input, applying the residual error connection module again, adding the output and the input of the 4 th convolution layer, transmitting the added result to the 5 th convolution layer, continuously applying the residual error connection module, and obtaining a convolution neural network model M2;
inputting the encoded data in the abnormal data set into the convolutional neural network model M2, and performing convolutional, residual error connection and pooling operation;
the middle layer is 1 layer, and the neuron number of the middle layer is。
The number of neurons of the output layer is equal to the number of demand classifications; the one_hot encoding method is used to encode the demand state and the function of the output neuron is selected as the log function.
Calculating a local anomaly factor value by comparing the average density of each data point with the adjacent neighbor points, the data points with the local anomaly factor value less than a certain threshold being anomaly points;
periodically updating the average density of the neighborhood set, and updating the local anomaly factor value of the data points in the neighborhood according to the updated average density;
processing the abnormal data set in real time by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, dynamically adjusting according to the environment and the data distribution of the data set change, and automatically identifying the abnormal type of the abnormal data;
identifying the classified abnormal data by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, wherein the calculation process of local abnormal factors in the enhanced flow type abnormal detection algorithm is as follows:
for each outlier data point in the real-time communication dataset:
Calculation ofK nearest neighbor of (2) and obtain the neighborhood set +.>;
Calculating data pointsTo data point->K reachable distance +.>Which is data point>K adjacent distance>And (4) point->And->European distance between->Is the maximum value of (2);
;
according to the set distance thresholdDefining the data points with the distance smaller than or equal to the threshold value as the data points in the neighborhood of the target point; target point- >Is expressed as:
;
wherein,representing target point->A set of data points within a neighborhood of (a); d is the dataset; />Representing data points +.>Is->A Euclidean distance between them; />Indicating that the distance is less than or equal to a set threshold +.>Data point->Belonging to the target point->Is within a neighborhood of (2); different k reachable distances correspond to different distance thresholds, and the threshold is dynamically adjusted according to the k reachable distances>The method comprises the steps of carrying out a first treatment on the surface of the Data point +.>To data point->K reachable distance +.>And threshold->Comparing; if the k reachable distance of a certain data point is smaller than the threshold +.>Marking it as an anomaly of the same type; if a continuous number of outliers occurs, the threshold value +.>To improve accuracy; if no or few outliers occur, the threshold value +.>To increase sensitivity;
calculation and calculationNeighborhood set of->Is expressed as> ;
;
;(4)
Calculation ofIs->; (5)
Wherein M is the number of data points; by dynamically adjusting threshold valuesSo that abnormal data points in more adjacent areas are detected.
The method and the device identify the classified abnormal data by utilizing the reinforced streaming abnormal detection algorithm and the trained model, monitor and analyze the abnormal data in real time, detect the newly-appearing abnormal data in time, have the capability of efficiently processing the real-time data stream, and timely send out early warning and response to the abnormal data.
Outputting results and early warning, and detecting the abnormal data set in real time by a self-adaptive method; judging whether the data sample is abnormal or not according to the threshold value, performing corresponding early warning processing, and feeding early warning information back to an early warning system;
the self-adaptive method tracks the change of the data in real time, dynamically adjusts the model and the parameters, can be more suitable for different types and distribution of data, provides more accurate abnormality detection and early warning results, and ensures that the early warning accuracy reaches more than 90 percent.
The self-adaptive method is utilized to enable the anomaly detection model to be automatically adjusted according to the change of the data, and adapt to new data distribution and mode change; in the adaptive methodThe updated formula of (c) is as follows:
;(6)
wherein,is a parameter->Is (are) updated value->Is self-adaptive learning rate->Is an adaptive gradient adjustment factor,/->Is the first moment of the gradient, i.e. the mean,/->Is the second moment of the gradient, i.e. the variance, +.>Is a smooth item->Is the current gradient;
the learning rate is adjusted according to the self-adaptive gradient adjustment factor, and the parameters of the model are updated by using the adjusted learning rate; when the adaptive gradient adjustment factor becomes large, the learning rate becomes small; when the self-adaptive gradient adjustment factor becomes smaller, the learning rate becomes larger; the adaptive gradient adjustment factor affects not only the absolute magnitude of the learning rate, but also the rate of change of the learning rate; the change amplitude of the learning rate in the training process is adjusted by controlling the change rate of the self-adaptive gradient adjustment factor; an adaptive gradient adjustment factor greater than a certain threshold may result in a slower change in learning rate, while an adaptive gradient adjustment factor less than a certain threshold may result in a faster change in learning rate.
According to the invention, the abnormal data is detected and early-warned by a self-adaptive method, the parameters and the threshold value of an abnormal detection algorithm are flexibly adjusted according to the characteristics and the change modes of the abnormal data, the accuracy of abnormal detection is improved, the abnormal data is dynamically adjusted according to the change of the abnormal data, and the abnormal data detection system adapts to abnormal conditions under different time periods and different data distribution, so that the abnormal detection system can respond to a new abnormal mode in real time and send early warning timely.
The present disclosure is a system, and/or computer program product. The computer program product includes a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.
Claims (6)
1. The abnormal data monitoring and analyzing method based on the data mining and the neural network is characterized by comprising the following steps:
acquiring a high-dimensional real-time communication data set of a user side, and performing classification marking and preprocessing;
extracting features by using a bi-directional cyclic neural network BiRNN model, and reducing the dimension by adopting a principal component analysis method;
encoding by a discrete encoding method to form encoded data;
classifying the encoded data into a normal data set and an abnormal data set by an enhanced weighted random forest algorithm;
inputting the abnormal data set into a convolutional neural network detection model for training;
calculating a local anomaly factor value by comparing the average density of each data point with the adjacent neighbor points, the data points with the local anomaly factor value less than a certain threshold being anomaly points;
periodically updating the average density of the neighborhood set, and updating the local anomaly factor value of the data points in the neighborhood according to the updated average density;
processing the abnormal data set in real time by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, dynamically adjusting according to the environment and the data distribution of the data set change, and automatically identifying the abnormal type of the abnormal data;
Outputting results and early warning, and detecting the abnormal data set in real time by a self-adaptive method; judging whether the data sample is abnormal or not according to the threshold value, performing corresponding early warning processing, and feeding early warning information back to an early warning system;
the self-adaptive method tracks the change of data in real time, dynamically adjusts the model and parameters, can be more suitable for different types and distribution of data, provides more accurate anomaly detection and early warning results, and ensures that the early warning accuracy reaches more than 90 percent;
classification is performed by an enhanced weighted random forest algorithm, comprising the steps of:
inputting the encoded data into an enhanced weighted random forest model for training;
performing anomaly detection and classification using a trained, enhanced weighted random forest model, separating the encoded data into a normal data set and an anomaly data set;
assigning a weight to each abnormal data sample;
for unlabeled new samples, the trained model uses the learned parameters and weights thereof to conduct classification inference according to the learned rules, and the new samples are distributed to normal categories or abnormal categories; the parameters and weights of the model are learned from the marked samples through an optimization algorithm in the training process;
The tag classification of data refers to the classification of all data sets into normal data sets and abnormal data sets, the encoded data is classified by an enhanced weighted random forest algorithm, and the weight function of the enhanced weighted random forest algorithm is as follows:
wherein,is the unbalance of the decision tree, N is the number of decision trees, < >>Is the voting weight of the decision tree;
given N balanced sub-training sets, training is carried out on the N balanced sub-training sets to obtain N decision tree classifiers,Is a natural number from 1 to N;
the final classifier is obtained by weighted voting and is expressed as follows:
wherein Y is an abnormal data set;
the finally obtained classifierThe testing set is used for testing classification effects;
identifying the abnormal data set by using an enhanced flow type abnormal detection algorithm and a trained convolutional neural network model, wherein the calculation process of local abnormal factors in the enhanced flow type abnormal detection algorithm is as follows:
for each outlier data point in the real-time communication dataset:
Calculation ofK nearest neighbor of (2) and obtain the neighborhood set +.>;
Calculating data pointsTo data point->K reachable distance +.>Which is data point>K adjacent distance>And (4) point->And (3) withEuropean distance between- >Is the maximum value of (2);
according to the distance thresholdDefining the data points with the distance smaller than or equal to the threshold value as the data points in the neighborhood of the target point; target point->Is expressed as:
wherein,representing target point->A set of data points within a neighborhood of (a); d is a real-time communication dataset; />Representing data points +.>Is->A Euclidean distance between them; />Indicating that the distance is less than or equal to the threshold value%>Data point->Belonging to the target point->Is within a neighborhood of (2); different k reachable distances correspond to different distance thresholds, and the threshold is dynamically adjusted according to the k reachable distances>The method comprises the steps of carrying out a first treatment on the surface of the Data point +.>To data point->K reachable distance +.>And threshold->Comparing; if the k reachable distance of a certain outlier data point is smaller than the threshold +.>Marking it as an anomaly of the same type; if successive outliers occur, the threshold value +.>To improve accuracy; if no outlier occurs, the threshold value +.>To increase sensitivity;
calculation and calculationNeighborhood set of->Is expressed as>;
Calculation ofIs->;
Wherein M is the number of data points;
the self-adaptive method is utilized to enable the anomaly detection model to be automatically adjusted according to the change of the data, and adapt to new data distribution and mode change; in the adaptive method The updated formula of (c) is as follows:
wherein,is a parameter->Is (are) updated value->Is self-adaptive learning rate->Is an adaptive gradient adjustment factor,/->Is the first moment of the gradient, i.e. the mean,/->Is the second moment of the gradient, i.e. the variance, +.>Is a smooth item->Is the current gradient;
the learning rate is adjusted according to the self-adaptive gradient adjustment factor, and the parameters of the model are updated by using the adjusted learning rate; when the adaptive gradient adjustment factor becomes large, the learning rate becomes small; when the self-adaptive gradient adjustment factor becomes smaller, the learning rate becomes larger;
acquiring a high-dimensional real-time communication data set of a user side, and carrying out marking classification on the high-dimensional real-time communication data set;
the high-dimensional dataset includes the following 10 dimensional data: timestamp, sender and receiver identification, communication mode, communication duration, communication quality indicator, bandwidth usage, communication location, data traffic, network topology, and user behavior;
removing users without abnormal communication behaviors, and retaining abnormal data samples;
the abnormal data comprise abnormal signal quality, abnormal call drop rate, abnormal call completing rate, abnormal data transmission rate, base station fault or abnormal state and abnormal flow.
2. The method for monitoring and analyzing abnormal data based on data mining and neural network according to claim 1, wherein the step of encoding the dimension-reduction feature using a discrete encoding method to form encoded data specifically comprises:
among the relevant attributes, one or more discrete attributes are categorized into One category, and each category is individually encoded using the one_hot encoding method.
3. The method for monitoring and analyzing abnormal data based on data mining and neural network according to claim 1, wherein the training process of the convolutional neural network detection model comprises:
initializing parameters with random values;
inputting the coded data in the abnormal data set, and obtaining an output value through forward propagation of a convolution layer, a pooling layer and a full connection layer;
calculating training errors between the output value and the target value of each layer of the convolution layer, the pooling layer and the full-connection layer; the target value refers to the classified abnormal data set;
performing back propagation updating weight according to the training error;
when the training error does not change significantly within 100 iterations, the training process is terminated; if the termination condition is not met, re-executing the input data;
And obtaining a trained convolutional neural network model.
4. The abnormal data monitoring and analyzing method based on data mining and neural network according to claim 3, wherein the number of neurons in the input layer of the convolutional neural network is set according to the number of input abnormal communication data features and the number of coding bits of each abnormal communication data feature, and the calculating method is as follows:
where M is the number of features of the input abnormal communication data,is the firstCoding bit number of i features, < >>Is the number of neurons of the input layer; the types of the abnormal communication data characteristics comprise frequency abnormality, time delay abnormality, abnormal data packet frequency, abnormal data packet size, signal strength abnormality, abnormal protocol behavior and data integrity abnormality.
5. The method for monitoring and analyzing abnormal data based on data mining and neural network according to claim 4, wherein the encoded data in the abnormal data set is inputted into a convolutional neural network detection model M1 for training,
the convolutional neural network model M1 comprises 12 convolutional layers and 8 pooling layers, the convolutional layers adopt 3 small convolutional kernels of 3*3, and the 8 pooling layers adopt maximum pooling;
Respectively adding residual error connection modules into a 3 rd convolution layer, a 4 th convolution layer and a 5 th convolution layer in the model M1, wherein one residual error connection module consists of two convolution layers, and batch normalization and activation functions are contained between the convolution layers; adding the output and the input of the 3 rd convolution layer of the model M1, transmitting the added result to the 4 th convolution layer as input, applying the residual error connection module again, adding the output and the input of the 4 th convolution layer, transmitting the added result to the 5 th convolution layer, continuously applying the residual error connection module, and obtaining a convolution neural network model M2;
inputting the encoded data in the abnormal data set into the convolutional neural network model M2, and performing convolutional, residual error connection and pooling operation;
the middle layer is 1 layer, and the neuron number of the middle layer is。
6. The abnormal data monitoring and analyzing method based on data mining and neural network according to claim 3, wherein the number of neurons of the output layer is equal to the number of demand classifications; the one_hot encoding method is used to encode the demand state and the function of the output neuron is selected as the log function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311718358.3A CN117421684B (en) | 2023-12-14 | 2023-12-14 | Abnormal data monitoring and analyzing method based on data mining and neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311718358.3A CN117421684B (en) | 2023-12-14 | 2023-12-14 | Abnormal data monitoring and analyzing method based on data mining and neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117421684A CN117421684A (en) | 2024-01-19 |
CN117421684B true CN117421684B (en) | 2024-03-12 |
Family
ID=89526888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311718358.3A Active CN117421684B (en) | 2023-12-14 | 2023-12-14 | Abnormal data monitoring and analyzing method based on data mining and neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117421684B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117849700B (en) * | 2024-03-07 | 2024-05-24 | 南京国网电瑞电力科技有限责任公司 | Modular electric energy metering system capable of controlling measurement |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961017A (en) * | 2019-02-26 | 2019-07-02 | 杭州电子科技大学 | A kind of cardiechema signals classification method based on convolution loop neural network |
CN111143838A (en) * | 2019-12-27 | 2020-05-12 | 北京科东电力控制系统有限责任公司 | Database user abnormal behavior detection method |
CN112001788A (en) * | 2020-08-21 | 2020-11-27 | 东北大学 | Credit card default fraud identification method based on RF-DBSCAN algorithm |
CN113256066A (en) * | 2021-04-23 | 2021-08-13 | 新疆大学 | PCA-XGboost-IRF-based job shop real-time scheduling method |
CN113378990A (en) * | 2021-07-07 | 2021-09-10 | 西安电子科技大学 | Traffic data anomaly detection method based on deep learning |
CN114124482A (en) * | 2021-11-09 | 2022-03-01 | 中国电子科技集团公司第三十研究所 | Access flow abnormity detection method and device based on LOF and isolated forest |
CN114859351A (en) * | 2022-06-10 | 2022-08-05 | 重庆地质矿产研究院 | Method for detecting surface deformation field abnormity based on neural network |
CN115577275A (en) * | 2022-11-11 | 2023-01-06 | 山东产业技术研究院智能计算研究院 | Time sequence data anomaly monitoring system and method based on LOF and isolated forest |
CN115964258A (en) * | 2022-12-30 | 2023-04-14 | 天翼物联科技有限公司 | Internet of things network card abnormal behavior grading monitoring method and system based on multi-time sequence analysis |
CN116595465A (en) * | 2023-04-10 | 2023-08-15 | 哈尔滨工程大学 | High-dimensional sparse data outlier detection method and system based on self-encoder and data enhancement |
CN116955926A (en) * | 2023-07-03 | 2023-10-27 | 保定耘云信息技术咨询有限公司 | Bank data analysis method based on deep learning |
CN117216660A (en) * | 2023-09-12 | 2023-12-12 | 杭州安恒信息技术股份有限公司 | Method and device for detecting abnormal points and abnormal clusters based on time sequence network traffic integration |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112543465B (en) * | 2019-09-23 | 2022-04-29 | 中兴通讯股份有限公司 | Abnormity detection method, abnormity detection device, terminal and storage medium |
DE102019135608A1 (en) * | 2019-12-20 | 2021-06-24 | Bayerische Motoren Werke Aktiengesellschaft | Method, device and system for the detection of abnormal operating conditions of a device |
EP3862927A1 (en) * | 2020-02-05 | 2021-08-11 | Another Brain | Anomaly detector, method of anomaly detection and method of training an anomaly detector |
-
2023
- 2023-12-14 CN CN202311718358.3A patent/CN117421684B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961017A (en) * | 2019-02-26 | 2019-07-02 | 杭州电子科技大学 | A kind of cardiechema signals classification method based on convolution loop neural network |
CN111143838A (en) * | 2019-12-27 | 2020-05-12 | 北京科东电力控制系统有限责任公司 | Database user abnormal behavior detection method |
CN112001788A (en) * | 2020-08-21 | 2020-11-27 | 东北大学 | Credit card default fraud identification method based on RF-DBSCAN algorithm |
CN113256066A (en) * | 2021-04-23 | 2021-08-13 | 新疆大学 | PCA-XGboost-IRF-based job shop real-time scheduling method |
CN113378990A (en) * | 2021-07-07 | 2021-09-10 | 西安电子科技大学 | Traffic data anomaly detection method based on deep learning |
CN114124482A (en) * | 2021-11-09 | 2022-03-01 | 中国电子科技集团公司第三十研究所 | Access flow abnormity detection method and device based on LOF and isolated forest |
CN114859351A (en) * | 2022-06-10 | 2022-08-05 | 重庆地质矿产研究院 | Method for detecting surface deformation field abnormity based on neural network |
CN115577275A (en) * | 2022-11-11 | 2023-01-06 | 山东产业技术研究院智能计算研究院 | Time sequence data anomaly monitoring system and method based on LOF and isolated forest |
CN115964258A (en) * | 2022-12-30 | 2023-04-14 | 天翼物联科技有限公司 | Internet of things network card abnormal behavior grading monitoring method and system based on multi-time sequence analysis |
CN116595465A (en) * | 2023-04-10 | 2023-08-15 | 哈尔滨工程大学 | High-dimensional sparse data outlier detection method and system based on self-encoder and data enhancement |
CN116955926A (en) * | 2023-07-03 | 2023-10-27 | 保定耘云信息技术咨询有限公司 | Bank data analysis method based on deep learning |
CN117216660A (en) * | 2023-09-12 | 2023-12-12 | 杭州安恒信息技术股份有限公司 | Method and device for detecting abnormal points and abnormal clusters based on time sequence network traffic integration |
Non-Patent Citations (4)
Title |
---|
Online Anomaly Detection Leveraging Stream-Based Clustering and Real-Time Telemetry;Andrian Putina等;《IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT》;20210331;第18卷(第1期);第839-854页 * |
基于LOF-RF的制冷系统故障检测与诊断研究;熊坤;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20220315(第3期);第C028-662页 * |
基于改进LOF算法的窃电检测方法研究;殷锋等;《中南民族大学学报(自然科学版)》;20220930;第41卷(第5期);第579-585页 * |
基于迁移学习的用气行为异常检测研究与应用;刘可立;《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》;20230215(第2期);第B017-327页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117421684A (en) | 2024-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097037B (en) | Intelligent monitoring method and device, storage medium and electronic equipment | |
CN110263846B (en) | Fault diagnosis method based on fault data deep mining and learning | |
CN108881196B (en) | Semi-supervised intrusion detection method based on depth generation model | |
CN111585948B (en) | Intelligent network security situation prediction method based on power grid big data | |
CN111353153B (en) | GEP-CNN-based power grid malicious data injection detection method | |
CN101399672B (en) | Intrusion detection method for fusion of multiple neutral networks | |
CN109766992B (en) | Industrial control abnormity detection and attack classification method based on deep learning | |
CN113255848B (en) | Water turbine cavitation sound signal identification method based on big data learning | |
CN117421684B (en) | Abnormal data monitoring and analyzing method based on data mining and neural network | |
WO2022052510A1 (en) | Anomaly detection system and method for sterile filling production line | |
CN113378990B (en) | Flow data anomaly detection method based on deep learning | |
CN112738014A (en) | Industrial control flow abnormity detection method and system based on convolution time sequence network | |
CN116684878B (en) | 5G information transmission data safety monitoring system | |
CN113158722A (en) | Rotary machine fault diagnosis method based on multi-scale deep neural network | |
CN110737976A (en) | mechanical equipment health assessment method based on multi-dimensional information fusion | |
CN116668083A (en) | Network traffic anomaly detection method and system | |
CN113780432B (en) | Intelligent detection method for operation and maintenance abnormity of network information system based on reinforcement learning | |
CN116662817A (en) | Asset identification method and system of Internet of things equipment | |
CN111666978A (en) | Intelligent fault early warning system for IT system operation and maintenance big data | |
WO2024027487A1 (en) | Health degree evaluation method and apparatus based on intelligent operations and maintenance scene | |
CN114915496A (en) | Network intrusion detection method and device based on time weight and deep neural network | |
CN113722230A (en) | Integrated assessment method and device for vulnerability mining capability of fuzzy test tool | |
CN115831339B (en) | Medical system risk management and control pre-prediction method and system based on deep learning | |
CN117391458B (en) | Safety production risk detection and early warning method and system based on data analysis | |
CN113609480B (en) | Multipath learning intrusion detection method based on large-scale network flow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |