WO2019233189A1 - 一种传感网络异常数据检测方法 - Google Patents

一种传感网络异常数据检测方法 Download PDF

Info

Publication number
WO2019233189A1
WO2019233189A1 PCT/CN2019/082673 CN2019082673W WO2019233189A1 WO 2019233189 A1 WO2019233189 A1 WO 2019233189A1 CN 2019082673 W CN2019082673 W CN 2019082673W WO 2019233189 A1 WO2019233189 A1 WO 2019233189A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
tree
sample
isolated
abnormal
Prior art date
Application number
PCT/CN2019/082673
Other languages
English (en)
French (fr)
Inventor
李光辉
许欧阳
Original Assignee
江南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江南大学 filed Critical 江南大学
Publication of WO2019233189A1 publication Critical patent/WO2019233189A1/zh
Priority to US16/993,454 priority Critical patent/US20200374720A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Definitions

  • the invention relates to a method for detecting abnormal data of a sensor network, and belongs to the field of data reliability detection of a wireless sensor network.
  • Wireless Sensor Network is a wireless network composed of a large number of stationary or moving sensors in a self-organizing and multi-hop manner to cooperatively perceive, collect, process, and transmit the perceived objects in the geographical area covered by the network And finally send this information to the owner of the network; and data, as a carrier of the information of the perceived object in the wireless sensor network, contains a lot of useful information.
  • the sensor is vulnerable to the environment The impact of various types of noise or events, including the node's own failure, environmental noise, and external attacks. All of them will affect the data collected by the nodes, which will cause the monitored environmental status to be incorrect.
  • various anomaly detection techniques are usually used to find out the abnormal data. .
  • the existing abnormal data detection schemes for wireless sensor networks are mainly divided into centralized detection schemes and distributed detection schemes.
  • the centralized detection scheme requires each node to transmit its own data to the sink node, thus the robustness of its network Very poor; and the distributed detection scheme, in order to improve the robustness and life cycle of the network, allows each node to automatically detect abnormal data, but each node only detects abnormal data based on its own model, so the false positive rate is high The detection rate is also low.
  • the isolated forest algorithm proposed by FTLiu et al. Has been widely used in data anomaly detection.
  • the algorithm mainly builds an integrated model of isolated trees from historical data sets and calculates its anomaly score s (Y) based on the average search depth of the test samples. , Sort the anomaly scores of the currently detected sample set in descending order and take a certain number of samples as the detected outliers to determine whether they are abnormal or not.
  • the advantage of this method is that the principle is simple, the algorithm complexity is low, and the detection accuracy is ideal, but its applicability to the anomaly detection of some concave data sets is low, that is, when there is a partial intersection between normal data points and abnormal data points, at this time According to the principle that the shorter the length of the detection path, the larger the abnormal score will lead to poor detection results, and it is ignored that the contribution of each tree in the forest to the calculation of the final abnormal score should be different.
  • This method is used in wireless sensor network abnormal data Not yet seen in detection applications.
  • the present invention provides a wireless sensor Method for detecting network abnormal data, the method includes:
  • the historical data set collected by the sensor nodes is used to construct the isolated tree set iforest; the leaf nodes of each isolated tree in the isolated tree set iforest are used to introduce the distance information of the tested sample and its various sample centers; The weighting coefficient of each isolated tree is set in the performance measure, and a weighted mixed isolated forest Whiforest model is constructed.
  • the Whiforest model is used to determine the abnormal situation of the wireless sensor network data in the sample to be tested.
  • the method further includes:
  • the historical data set collected by the sensor nodes is divided into a training set and a test set.
  • the isolated tree set iforest is constructed using historical data sets collected by the sensor nodes; the tested samples and their various sample centers are introduced on the leaf nodes of each isolated tree in the isolated tree set iforest.
  • Distance information combined with the diversity measure to set the weight coefficient of each isolated tree, and construct a weighted mixed isolated forest Whiforest model, including:
  • Step 1 Use the training set data in the historical data set to construct each isolated tree in the isolated tree set iforest, including setting the parameters bootstrap sampling number ⁇ , the forest size T, the weight coefficient threshold ⁇ , the size of the verification sample set Val_W, and Known abnormal sample addition rate ratio;
  • Step 2 Randomly select the known abnormal samples according to the ratio of the known abnormal sample addition ratio to each lone tree in iforest;
  • Step 3 Calculate the training sample center Cen-s in the leaf nodes of each tree, and the distance ⁇ (x) between each test sample x in the leaf node and Cen-s.
  • the mean of a tree is recorded as s c (x);
  • Step 4 Calculate the abnormal sample center Cen-a in its leaf nodes, and calculate the distance between each tested sample x in the leaf node and the above-mentioned Cen-a as ⁇ a (x), and ⁇ (x)
  • ⁇ a (x) The ratio of the mean of x) and ⁇ a (x) in all isolated trees is denoted as s a (x);
  • Step 5 Select the verification sample set Val-W according to the historically collected data set, use the established isolated tree set iforest to detect it, combine the idea of diversity of the base classifiers in the integrated learning, and isolate the forests through disagreement measures Diversity between trees is calculated to obtain a T * T symmetrical matrix dividers with a diagonal of 0; where T is the number of isolated trees in the isolated tree set iforest;
  • Step 6 summing the diversity matrix, size press forest T B index as commercially obtained, at the moment the B index value is compared to a threshold value [mu], the weights are set as follows;
  • Step 8 Normalize the original Score (x) score of the samples in the current data window and the two distance-based scores currently introduced, namely ⁇ Score, s a (x), s c (x) ⁇ ,
  • the normalization formula used is shown below,
  • s (x) refers to the above three scores: Score, s a (x), and s c (x). For the normalized value, the above-mentioned three scores are finally combined to obtain the final window sample abnormal score s final ;
  • Step 9 arrange s final in descending order, obtain the data sample with the highest abnormal score according to the domain knowledge or refer to the ratio of the number of abnormal data known in the original data set, and compare it with the sample data to be tested. Then calculate the detection rate and false alarm rate Evaluation index
  • Step 10 If the node detects that there are abnormal samples in the data window, it passes the sequence number to the cluster head node for further verification or processing.
  • step 4 if the leaf node has no abnormal samples, its abnormal sample center Cen-a is recorded as 0.
  • summing the diversity matrices is summing the diversity matrices in columns.
  • the isolation tree construction termination condition the samples are not separable, that is, they contain only one data value or the data samples are exactly the same or the depth of the isolated tree reaches the maximum log ( ⁇ ), where ⁇ is the parameter bootstrap The number of samples.
  • the original Score (x) score of the sample in the current data window is calculated according to the following formula:
  • h (x) represents the path length of the data sample x on a certain tree
  • C ( ⁇ ) is the average search path length of the Itree constructed by the number of samples ⁇ .
  • the path length of the data sample x on a certain tree h (x) e + C (T.size), where C (T.size) is the average path of the binary tree constructed by T.size pieces of data length.
  • Another object of the present invention is to provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the program, the steps of the foregoing method are implemented.
  • a third object of the present invention is to provide a processor for running a program, wherein the method is executed when the program runs.
  • FIG. 1 is a schematic flowchart of a method for detecting abnormal data in a wireless sensor network provided by the present application.
  • FIG. 2 is one of the AGD datasets in a wireless sensor network abnormal data detection method based on a weighted hybrid isolated forest.
  • FIG. 3 is a schematic diagram of an AGD data set in a wireless sensor network abnormal data detection method based on a weighted mixed isolated forest (2).
  • FIG. 4 is a graph of abnormal scores of a conventional iforest model in a wireless sensor network abnormal data detection method based on a weighted mixed isolated forest.
  • FIG. 5 is an anomaly score diagram of a Whiforest model in a wireless sensor network abnormal data detection method based on a weighted mixed isolated forest.
  • This application proposes a method for detecting abnormal data in wireless sensor networks by improving the algorithm of isolated forests.
  • This method detects abnormal data in wireless sensor networks based on a weighted hybrid isolated forest (Whiforest): First, the isolated forest Based on the algorithm, a certain size of isolated tree set iforest is constructed, the distance information of the sample under test and its various sample centers is introduced on each leaf node, and the weight coefficient is set for the isolated tree in combination with the diversity measure, and finally used
  • the improved isolated forest algorithm judges the abnormal situation of wireless sensor network data.
  • Detection rate refers to the ratio of the number of abnormal data samples detected by the algorithm to the total number of abnormal data samples actually contained in the data set.
  • False alarm rate refers to the ratio of the number of normal data samples miscalculated as abnormal data samples to the total number of normal data samples.
  • Data window When anomaly detection is performed, the data in the most recent time period is usually selected, and a fixed-length sliding window is taken as a data block for detection processing of the sensor data.
  • the termination condition of the isolated tree construction The samples cannot be divided, that is, they contain only one data value or the data samples are the same or the depth of the isolated tree reaches the maximum log ( ⁇ ), where ⁇ is the number of data samples of the root node of the isolated tree.
  • the search path depth h (x) which represents the path length of the data sample x on the isolated tree, where T.size represents the number of samples that fall on the same leaf node as x during training, and e represents the sample x from the root node to The number of edges that a leaf node passes through.
  • the average path length C (n) of a binary tree is the average path length of a binary tree constructed with a certain amount of data.
  • H (n-1) can be estimated by ln (n-1) +0.5772156649, and the latter term is Euler's constant e.
  • Bootstrap self-sampling is used to build a certain number of isolated trees (Isolation Tree, Itree).
  • ⁇ data samples are sampled from the total training sample, and a certain attribute (such as temperature, humidity, etc.) is randomly selected as the root node.
  • a random value is obtained between the two maximum values (maximum value and minimum value) of the attribute, so that samples smaller than the value in the root node are divided into its left child node, and those greater than or equal to the value are placed in the right child node.
  • Then recursively execute the left and right child nodes respectively as root nodes.
  • follow the above operations in turn to construct each tree to complete the training of the model.
  • test sample detection stage Second, the test sample detection stage:
  • the abnormal score of sample x is determined by its search path depth h (x) in each Itree.
  • the specific process is to search x down the root node of an Itree according to different attributes and different values until it reaches the leaf node.
  • FIG. 2-6 There is a set of one-dimensional data shown in Figure 2-6 below.
  • Our purpose is to separate point A and point B.
  • the method used is to first randomly select a value s between the maximum value and the minimum value (here the attribute is only 1 dimensional, regardless of the selection of the attribute), and then divide the data into two groups of left and right according to less than s and greater than or equal to s. Perform the above steps recursively and stop when the data sample is inseparable. It can be seen from the figure below that the position of point B is off-edge with respect to other data, and it can be isolated in a small number of times; while the position of point A is the overlap of most blue points. It takes more times to isolate it.
  • the distance between B and D is relatively long compared to other data, which is considered as abnormal data, and A and C are considered as normal data.
  • the anomaly data is visually more remote than other data points. It may take a few fewer data space partitions to separate them separately, while normal data is the opposite of abnormal data. This is the core working principle of Isolation Forest.
  • This embodiment provides a method for detecting abnormal data of a wireless sensor network.
  • the method includes:
  • S1 Divide the historical data set collected by the sensor nodes into training set and test set.
  • S3 Manually add a small number of known anomalous samples to the model obtained in S2, and build a Whiforest model based on the weight coefficients calculated from the two types of distance information of isolated leaf nodes and the diversity in the forest.
  • the first data sample and test are given isolated leaf nodes normal, abnormal data from the information center of the sample (i.e., s c (x) and ⁇ a (x)) of these two definitions.
  • Definition 1 During the training phase, calculate the training sample center Cen-s in each tree leaf node, and the distance between each test sample x in the leaf node and the above-mentioned Cen-s. The mean of a tree is recorded as s c (x).
  • the proposed Whiforest algorithm further combines the idea of diversity of base classifiers in ensemble learning.
  • each tree gives outlier scores to each test sample.
  • the algorithm combines each tree Diversity and its detection accuracy set weights, so that some trees with large diversity have greater control over the final anomaly index value determination.
  • the anomaly score s final of the sample to be tested is obtained, it is first sorted in descending order. According to the domain knowledge or reference to the original data set, the ratio of the number of anomaly ratios is known to obtain a certain number of data samples with the highest anomaly scores. Comparison of data sample marks, calculation of detection rate and false alarm rate-related evaluation indicators.
  • the WhisolationForest algorithm pseudo code is shown below.
  • This algorithm has two relatively superior characteristics: 1) If the data set has the distribution shown in Figure 3, when the algorithm performs detection, the distance information of the two centers of the leaf nodes is added during the abnormal score calculation. , Which greatly reduces the probability of underreporting of anomalous points in the center of normal samples, effectively improving the detection rate of such outliers; 2) the algorithm without adding weight coefficients will be affected by the detection of some data samples The impact of some low-relevant isolated tree decision results also has a certain degree of negative impact on the detection results. The Whiforest algorithm further improves the detection accuracy and reduces the detection accuracy by adding sub-metrics and weight coefficients. False alarm rate.
  • This embodiment provides the practical application of the wireless sensor network abnormal data detection method shown in the first embodiment.
  • the data flow samples collected by the wireless sensor network nodes are used as the basis of the isolated forest algorithm to first construct an isolated forest set iforest of a certain size.
  • the distance information of the sample to be tested and its various sample centers is introduced on each leaf node, and the weight coefficient of the isolated tree is set in combination with the diversity measure.
  • the improved isolated forest algorithm is used to sample the WSN unit size data.
  • the anomaly scores are sorted in descending order, and the abnormality is determined in conjunction with the parameter ratio.
  • the data samples are derived from the data collected by the WSN nodes deployed in the Intel Berkeley Lab (IBRL).
  • the system contains 54 MICA2 sensor nodes.
  • the data sampling period of each node is 30s.
  • the characteristics of the collected data include temperature, humidity, There are 4 attributes of light intensity and node voltage.
  • 7500 sets of temperature, humidity, and light intensity measured at node 25 in March 2004 are selected as sample data.
  • t is the temperature data matrix
  • h is the humidity data matrix
  • l the light intensity data matrix
  • h [37.573, 37.847, 22.465, 38.394, 22.538, 38.803, 22.685, 22.721, 22.685 ... 23.051, 39.552, 39.552, 39.687, 39.687, 39.755, 39.755, 39.823, 40.026 ...
  • the above t, h, and l are composed into a matrix D with a size of s rows and 3 columns.
  • it is divided into a training data sample Train and a test data sample Test by 3: 1, and the training of the isolated forest is performed with the Train data set as input.
  • the value coefficient threshold ⁇ sets a weight coefficient for each lone tree in the forest.
  • AGD Artificial Global Dataset
  • the number of attributes of the data set is 3, and the selected test data set is used.
  • the sizes are 15000 and 21,000, respectively.
  • the data distribution is roughly a concentric sphere with abnormal clusters at the center and edges, as shown in Figure 3.
  • the basic parameters for generating this data set are the distribution mean and covariance of the central anomaly cluster and edge anomaly cluster samples, which are expressed as mea-center, mea-edge, and cov-center and cov-edge.
  • the specific parameter settings are as follows: As shown in the table.
  • the detection results of the selected test data can be referred to FIG. 4 and FIG. 5. It can be seen that the detection rate of the central outlier and the edge outlier by the algorithm of the present invention is significantly higher than that of the traditional isolated forest algorithm.
  • Some steps in the embodiments of the present invention may be implemented by software, and corresponding software programs may be stored in a readable storage medium, such as an optical disc or a hard disk.

Abstract

本发明公开了一种传感网络异常数据检测方法,属于无线传感器网络数据可靠性检测领域。通过利用传感器节点采集的历史数据集,以孤立森林算法为基础构造一定规模的孤立树集合iforest,在其各叶子节点上引入待测样本与其各类样本中心的距离信息,并结合多样性度量对孤立树进行权值系数的设定,构造加权混合孤立森林Whiforest模型,最终利用改进得到的加权混合孤立森林Whiforest模型对无线传感网络数据异常情况进行判定。通过对各传感器节点数据集进行实验,结果表明该方法由于基于森林中各棵树对最终异常分值的计算所给予的贡献不同而设定其权值系数,因而较传统iforest模型,异常检测的精度得到了提高。

Description

[根据细则91更正 11.10.2019] 一种传感网络异常数据检测方法 技术领域
[根据细则91更正 11.10.2019] 
本发明涉及一种传感网络异常数据检测方法,属于无线传感器网络数据可靠性检测领域。
背景技术
无线传感器网络(Wireless Sensor Network,WSN)是由大量的静止或移动的传感器以自组织和多跳的方式构成的无线网络,以协作地感知、采集、处理和传输网络覆盖地理区域内被感知对象的信息,并最终把这些信息发送给网络的所有者;而数据作为无线传感网络中承载被感知对象的信息的载体,包含有很多有用的信息,在采集数据的过程中,传感器易受环境中各类噪声或事件的影响,包括节点自身故障、环境噪声以及外部攻击等。它们都会对节点采集到的数据产生影响,进而导致所监测到的环境状态不正确,为确保无线传感器网络能够准确反映所监测的环境状态,通常需要采用各种异常检测技术找出其中的异常数据。
现有针对无线传感器网络异常数据检测方案主要分为集中式检测方案和分布式检测方案,其中,集中式检测方案要求每个节点都需要将自己的数据传送给汇聚节点,因而其网络的健壮性非常差;而分布式检测方案为提高网络的健壮性和生命周期,让各个节点都能够自动检测异常数据,但每一个节点只根据自己所建立的模型来检测异常数据,因而误报率较高,检测率也较低。
F.T.Liu等人提出的孤立森林算法在数据异常检测中具有广泛应用,该算法主要是通过对历史数据集构建孤立树集成模型,并以测试样本的平均搜索深度计算其异常分值s(Y),对当前检测样本集的异常分值降序排列并取前一定数目的样本作为检测出来的异常值,从而决定其异常与否。该方法的优点是原理简单、算法复杂度较低且检测精度理想,但其对于一些凹面数据集的异常检测适用性较低,即当正常数据点和异常数据点之间存在部分交叉,此时按照检测路径长度越短异常分值越大原则则会导致检测效果较差,并且忽略了森林中各棵树对最终异常分值的计算所给予的贡献应当不同,该方法在无线传感器网络异常数据检测应用中尚未见到。
发明内容
为了解决目前存在的孤立森林算法对于凹面数据集的异常检测适用性较低且没有对森林中各棵树对最终异常分值的计算所给予的贡献进行区分的问题,本发明提供一种无线传感器网络异常数据检测方法,所述方法包括:
以孤立森林算法为基础,利用传感器节点采集的历史数据集构造孤立树集合iforest;在孤立树集合iforest中各孤立树的各叶子节点上引入待测样本与其各类样本中心的距离信息;结合多样性度量设定各孤立树的权值系数,构造加权混合孤立森林Whiforest模型,利用Whiforest模型对待测样本中无线传感器网络数据的异常情况进行判定。
可选的,所述以孤立森林算法为基础,利用传感器节点采集的历史数据集构造孤立树集合iforest之前,还包括:
将传感器节点采集的历史数据集划分为训练集和测试集。
可选的,所述以孤立森林算法为基础,利用传感器节点采集的历史数据集构造孤立树集合iforest;在孤立树集合iforest中各孤立树的各叶子节点上引入待测样本与其各类样本中心的距离信息;结合多样性度量设定各孤立树的权值系数,构造加权混合孤立森林Whiforest模型,包括:
步骤1:以历史数据集中的训练集的数据构建孤立树集合iforest中的各孤立树,包括设定参数bootstrap采样数ψ、森林规模大小T、权值系数阈值μ、验证样本集Val_W的大小和已知异常样本添加率ratio;
步骤2:根据已知异常样本添加率ratio随机选取已知异常样本加入到iforest中的各孤立树中;
步骤3:计算每棵树的叶子结点中的训练样本中心Cen-s,以及每个待测样本x在叶节点中与Cen-s间的距离δ(x),将其在森林中的每棵树的均值记作s c(x);
s c(x)=E(δ(x))
步骤4:在其叶子结点中计算异常样本中心Cen-a,并计算每个待测样本x在叶节点中与上述的Cen-a间的距离记作δ a(x),并将δ(x)和δ a(x)在所有孤立树中均值的比值记作s a(x);
Figure PCTCN2019082673-appb-000001
步骤5:根据历史采集的数据集选取验证样本集Val-W,并使用上述建立好的孤立树集合iforest对其检测,结合集成学习中基分类器多样性的思想,通过不合度量对森林中孤立树间的多样性进行计算,得到一个对角为0的T*T对称矩阵diversity;其中,T为孤立树集合iforest中孤立树的棵数;
步骤6:对所述diversity矩阵求和,并按森林规模大小T作商得到B index,此刻将B index值与阈值μ比较,权值设置如下所示;
Figure PCTCN2019082673-appb-000002
步骤7:设定B index值大于等于μ的树的权值w1=B index+1;小于μ的树的权值w2=1-B index,对s c(x)和s a(x)变量都乘以w1和w2,以下式计算s c(x)和s a(x):
s c(x)=W*δ(x)
δ a(x)=W*δ a(x)
步骤8:将当前数据窗口内样本的原始Score(x)分值以及目前引入的基于距离的2个分值即{Score,s a(x),s c(x)}进行归一化处理,使用的归一化公式如下所示,
Figure PCTCN2019082673-appb-000003
其中s(x)代指上述Score、s a(x)、s c(x)3个分值,
Figure PCTCN2019082673-appb-000004
为归一化后的值,最终以下式融合上述3个分值得到最终的窗口样本异常分值s final
Figure PCTCN2019082673-appb-000005
步骤9:降序排列s final,根据领域知识或参考原先数据集已知的异常数目比例ratio,得到异常分值最高的数据样本,再和待测数据样本标记对比,计算检测率以及误报率相关评价指标;
步骤10:若节点检测到数据窗口内有异常样本,则将其所属顺序编号传递到簇头节点,进行下一步的验证或处理。
可选的,所述步骤4中,若叶节点无异常样本,则其异常样本中心Cen-a记为0。
可选的,所述步骤6中,对所述diversity矩阵求和为对所述diversity矩阵按列求和。
可选的,所述步骤1中,孤立树构建终止条件:样本不可再分,即只包含一条数据值或数据样本完全相同或孤立树的深度达到最大值log(ψ),其中ψ为参数bootstrap采样数。
可选的,所述步骤8中,当前数据窗口内样本的原始Score(x)分值根据下述公式计算得到:
Figure PCTCN2019082673-appb-000006
其中,h(x)表示数据样本x在某棵树上的路径长度,C(ψ)为以采样数ψ构建的Itree的平均搜索路径长度。
可选的,所述数据样本x在某棵树上的路径长度h(x)=e+C(T.size),C(T.size)是以T.size条数据构建的二叉树的平均路径长度。
本发明的另一个目的在于提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法的步骤。
本发明的第三个目的在于提供一种处理器,所述处理器用于运行程序,其中,所述程序 运行时执行上述方法。
本发明有益效果是:
通过利用传感器节点采集的历史数据集,以孤立森林算法为基础构造一定规模的孤立树集合iforest,在其各叶子节点上引入待测样本与其各类样本中心的距离信息,并结合多样性度量对孤立树进行权值系数的设定,最终利用改进的孤立森林算法对无线传感网络数据异常情况进行判定。通过对各传感器节点数据集进行实验,结果表明该方法由于基于森林中各棵树对最终异常分值的计算所给予的贡献不同而设定其权值系数,因而提高了异常检测的精度,具有广阔的应用前景。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请提供的一种无线传感器网络异常数据检测方法的流程示意图。
图2为基于加权混合孤立森林的无线传感网络异常数据检测方法中的AGD数据集示意图之一。
图3为基于加权混合孤立森林的无线传感网络异常数据检测方法中的AGD数据集示意图之二。
图4为基于加权混合孤立森林的无线传感网络异常数据检测方法中的传统iforest模型的异常分值图。
图5为基于加权混合孤立森林的无线传感网络异常数据检测方法中的Whiforest模型的异常分值图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
本申请通过对孤立森林算法进行改进,提出了一种无线传感器网络异常数据检测方法,该方法基于加权混合孤立森林(Weighted Hybrid Isolation Forest,Whiforest)对无线传感器网络异常数据进行检测:首先以孤立森林算法为基础,构造一定规模的孤立树集合iforest,在其各叶子节点上引入待测样本与其各类样本中心的距离信息,并结合多样性度量对孤立树进行权值系数的设定,最终利用改进的孤立森林算法对无线传感网络数据异常情况进行判定。 为进一步阐明方法的原理和创新之处,首先介绍一些基本概念:
1、检测率,指算法检测到的异常数据样本数与数据集中实际所含异常数据样本总数之比。
2、误报率,指被算法误判为异常数据样本的正常数据样本数与总的正常数据样本数之比。
3、数据窗口,在执行异常检测时,通常会选取最近一个时间段内的数据,对传感器数据取固定长度的滑动窗口作为一个数据块进行检测处理。
4、孤立树构建终止条件,样本不可再分,即只包含一条数据值或数据样本完全相同或孤立树的深度达到最大值log(ψ),其中ψ为孤立树的根节点数据采样数。
5、搜索路径深度h(x),表示数据样本x在孤立树上的路径长度,其中T.size表示训练时与x落在同一叶子节点的样本数目,e代表的是样本x从根节点到叶子节点所经过的边的个数。
h(x)=e+C(T.size)
6、二叉树的平均路径长度C(n),是以一定数目的数据构建的二叉树的平均路径长度。其中,H(n-1)可用ln(n-1)+0.5772156649估算,后边一项为欧拉常数e。
Figure PCTCN2019082673-appb-000007
7、检测异常分值Score(x),待测数据样本的最终异常分值Score(x),由数据x的路径长度均值E(h(x))和以采样数ψ构建的树的平均搜索路径长度C(ψ)归一化得到。
Figure PCTCN2019082673-appb-000008
一、模型训练阶段:
使用bootstrap自助采样构建一定数目的孤立树(Isolation Tree,Itree),首先从总的训练样本中采样ψ个数据样本,并随机选取某个属性(比如温度、湿度等)作为根节点,同时在该属性的2个最值(最大值和最小值)间获取一个随机值,使得根节点中小于该值的样本划分在它的左子节点,而大于等于该值的置于右子节点中。接着以左右子节点分别作为根节点递归执行下去。依次按以上操作进行每棵树的构建,完成模型的训练。
二、待测样本检测阶段:
结合森林中所有孤立树的检测结果,获得每个数据点的异常分值。样本x的异常分值是由它在每棵Itree中的搜索路径深度h(x)决定的。具体过程是将x沿着一棵Itree的根节点按不同属性以及不同取值大小一直向下搜索,直到抵达叶子节点。
下面用2个实例来理解孤立森林的具体过程。
现有一组如下图2-6所示的1维数据,我们的目的是把点A和点B分离出来。使用的方式即先在最大值和最小值间随机选择一个值s(这里属性只有1维,不考虑属性的选择),然后按照小于s以及大于等于s将数据分成左右两组。递归执行上述步骤,当数据样本不可分停止。由下图可以看出点B相对于其他数据所处位置偏边缘,只需很少的次数就可以把它孤立出来;而点A所处位置则为大多数蓝色点的重叠处,这就需要更多的次数才能把它孤立出来。
此刻换作2维数据集,若2个特征分别为x和y,则顺着两个属性轴进行随机划分,为了分离出下图2-7中的点C和点D。我们先随机选择x和y中的任意一个,按照上文中对1维数据的处理方式,依据和特征值的大小关系将数据划分为左右两块。依然按上文方式划分直到无法细分,在这里无法细分指的就是划分后的小块数据中只剩下1个数据点,或所剩数据完全相同。直观上就可以看出,点D相对于其他数据点比较偏远,只需要几次划分就可以将它分离出来;而点C所处位置偏数据块的中央密集处,所以需要的划分次数会更多一些。
以上述2个实例看,B和D相对于其他数据相隔的距离较远,被认为是异常数据,而A和C会被认为是正常数据。异常数据对比其他数据点直观上看会显得较为偏远,可能需要较少几次数据空间划分就可将它们单独分离,而正常数据则与异常数据相反。这也就是Isolation Forest的核心工作原理。
实施例一:
本实施例提供一种无线传感器网络异常数据检测方法,参见图1,所述方法包括:
S1:对传感器节点采集所得的历史数据集进行划分,分别为训练集和测试集。
S2:利用训练集构造孤立树集合iforest。
S3:对S2中所得模型手动添加少量已知异常样本,并基于孤立树叶节点的两类距离信息融合森林中多样性计算所得的权值系数建立Whiforest模型。
S4:对于各分布节点,当有一定数量的新样本进入数据窗口内时,使用已经训练好的Whiforest模型对这些新数据进行检测得到异常分值并判断数据是否异常。
S5:若S4中存在样本异常,则将节点对数据的检测结果传递给簇头节点,以便执行进一步的后续操作。
具体的,首先分别给出待测数据样本与孤立树叶结点中正常、异常数据样本中心的距离信息(即s c(x)和δ a(x))这两个定义。
定义1在训练阶段,计算每棵树叶子结点中的训练样本中心Cen-s,以及每个待测样本x在叶节点中与上述的Cen-s间的距离,将其在森林中的每棵树的均值记作s c(x)。
定义2随机选取少量已知异常样本加入到已经训练完毕的Itrees中,在其叶子结点中计算异常样本中心Cen-a(若某些叶节点无异常样本,则记为0),并计算每个待测样本x在叶节点中与上述的Cen-a间的距离记作δ a(x)。
所提Whiforest算法又进一步结合了集成学习中基分类器多样性的思想,在孤立森林对数据执行异常检测时,每棵树会对各个待测样本给出异常分值,该算法结合每棵树的多样性及其检测精度设定了权值,进而使得多样性大的一些树对最终异常指数值得大小判定有更大的控制权。
首先选取一定数目的样本Val-W,并使用事先训练好的孤立森林对其检测,通过多样性尺度对森林中每棵树之间的多样性进行计算,得到一个对角为0的T*T对称矩阵diversity,对diversity矩阵按列求和并按森林规模大小T作商得到B index,此刻将B index值与阈值μ比较,权值设置如公式(2)所示,设定B中值大于等于μ的树的权值w1=B index+1,小于μ的树的权值w2=1-B index。对后边用到的几个变量都乘以w1和w2。
Figure PCTCN2019082673-appb-000009
s c(x)=W*δ(x)            (3)
δ a(x)=W*δ a(x)           (4)
通过对δ(x)以及δ a(x)的加权W处理以后,再以上述公式(3)和(4)计算s c(x)和s a(x),接着将原始Score分值以及目前引入的基于距离的2个分值即{Score,s a(x),s a(x)}进行归一化处理(使用的归一化公式如下(5)所示,其中s(x)代指的就是上述3个分值,
Figure PCTCN2019082673-appb-000010
为归一化后的值),最终以公式(6)融合3个分值得到最终的异常分值s final
Figure PCTCN2019082673-appb-000011
Figure PCTCN2019082673-appb-000012
当得到待测样本的异常得分s final后,首先将它降序排列,根据领域知识或参考原先数据集已知的异常数目比例ratio,得到异常分值最高的一定数目的数据样本,再和待测数据样本标记对比,计算检测率以及误报率相关评价指标。WhisolationForest算法伪代码具体如下所示。
算法设计:
Figure PCTCN2019082673-appb-000013
该算法具有两个相对较优的特点:1)若数据集呈图3所示的分布,则由该算法执行检测时,由于在异常分值计算时加入了叶子结点两个中心的距离信息,使得处于正常样本中心的异常点被漏报的概率大大降低,有效地提高了对该类异常值的检测率;2)没有加入权值系数时的算法对于某些数据样本的检测会受到森林中某些相关度较低的孤立树的决策结果的影响,对检测结果也存在一定程度上的负面影响,而Whiforest算法则通过不合度量以及权值系数的加入,进一步提高了检测精度并降低了误报率。
实施例二
本实施例提供实施例一所示的无线传感器网络异常数据检测方法的实际应用,利用无线传感网络节点所采集的数据流样本,以孤立森林算法为基础,首先构造一定规模的孤立树集合iforest,在其各叶子节点上引入待测样本与其各类样本中心的距离信息,并结合多样性度量对孤立树进行权值系数的设定,最终利用改进的孤立森林算法对WSN单位大小的数据样本集中异常分值降序排列,并结合参数ratio进行异常情况的判定。以下给出该方法在具体数据集中的实施案例。
数据样本来源于英特尔伯克利实验室中所部署的WSN节点所采集的数据(IBRL),该系统包含有54个MICA2传感器节点,每个节点的数据采样周期为30s,采集数据特征包含温度、湿度、光照强度以及节点电压4个属性。在此选取25号节点在2004年3月份测得的 7500组温度、湿度以及光照强度作为样本数据。其中t表示温度数据矩阵,h表示湿度数据矩阵,l表示光照强度数据矩阵,则有:
t=[19.616,19.449,-19.760,19.145,-16.898,18.933,-14.468,-13.527,-13.390…29.406,18.606,18.587,18.557,18.538,18.498,18.479,18.479,18.469…18.302,18.322,18.322,18.322,18.322,18.312,18.302,18.302,18.302….18.293,18.263,18.244,18.263,18.244,18.234,18.234,18.224,18.214...17.920,17.930,17.930,17.921,17.901,17.901,17.891,17.891,17.871...17.861,17.861,17.852,17.842,17.852,17.832,17.832,17.823,17.822…...];
h=[37.573,37.847,22.465,38.394,22.538,38.803,22.685,22.721,22.685…23.051,39.552,39.552,39.687,39.687,39.755,39.755,39.823,40.026…40.060,39.959,39.959,39.925,39.959,39.925,39.925,39.959,39.891….39.959,40.026,40.026,40.026,40.026,39.959,40.026,40.026,40.060...40.162,40.094,40.094,40.162,40.094,40.094,40.263,40.162,40.196...40.229,40.229,40.229,40.230,40.2976,40.196,40.229,40.229,40.264…...];
l=[97.52,97.52,0.46,97.52,0.46,97.52,0.46,0.46,0.46…0.46,97.52,101.2,97.52,97.52,97.52,97.52,101.2,97.52…97.52,97.52,97.52,97.52,97.52,101.2,97.52,97.52,97.52….101.2,101.2,101.2,101.2,101.2,101.2,101.2,101.2,101.2...97.52,97.52,97.52,97.52,101.2,101.2,101.2,97.52,101.2...101.2,97.52,97.52,97.52,97.52,97.52,97.52,101.2,101.2…...];
将上述t、h以及l组成大小为s行3列的矩阵D,在此将它按3:1拆分成训练数据样本Train和测试数据样本Test,以Train数据集为输入进行孤立森林的训练,并在训练过程中根据领域知识添加少量已知异常样本计算2种距离,接着选取大小为val-w的验证样本集,使用该森林计算每棵树的不合度量值,结合其检测精度以及权值系数阈值μ为该森林中每棵孤立树设定权值系数。
使用引入了距离信息的森林模型对Test数据集执行检测,对当前单位大小size-t个样本的异常分值降序排列,结合ratio,取前size-t*ratio个数据作为当前单位大小样本集中的异常数据;而后续异常分值更低的数据点则为正常值。
为体现该实施例一所示的方法在凹面数据集上的优势,另外在人工生成的AGD(Artificial Global Dataset)数据集上进行了实验,该数据集的属性数目为3,选取的测试数据集大小分别是15000和21000。该数据分布大致为一个中心以及边缘分别存在异常簇的同心球体,如图3所示。本实验中,生成该数据集的基本参数有中心异常簇以及边缘异常簇样本的分布均值和协方差,分别表示为mea-center、mea-edge以及cov-center和cov-edge,具体参数设置如下表所示。
表1:AGD数据集具体参数
Figure PCTCN2019082673-appb-000014
具体检测流程中,选取部分测试数据的检测结果可参照图4和图5,可以看出本发明中的算法对中心异常点以及边缘异常点的检测率明显高于传统孤立森林算法。
本发明实施例中的部分步骤,可以利用软件实现,相应的软件程序可以存储在可读取的存储介质中,如光盘或硬盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种无线传感器网络异常数据检测方法,其特征在于,所述方法包括:
    以孤立森林算法为基础,利用传感器节点采集的历史数据集构造孤立树集合iforest;在孤立树集合iforest中各孤立树的各叶子节点上引入待测样本与其各类样本中心的距离信息;结合多样性度量设定各孤立树的权值系数,构造加权混合孤立森林Whiforest模型,利用Whiforest模型对待测样本中无线传感器网络数据的异常情况进行判定。
  2. 根据权利要求1所述的方法,其特征在于,所述以孤立森林算法为基础,利用传感器节点采集的历史数据集构造孤立树集合iforest之前,还包括:
    将传感器节点采集的历史数据集划分为训练集和测试集。
  3. 根据权利要求2所述的方法,其特征在于,所述以孤立森林算法为基础,利用传感器节点采集的历史数据集构造孤立树集合iforest;在孤立树集合iforest中各孤立树的各叶子节点上引入待测样本与其各类样本中心的距离信息;结合多样性度量设定各孤立树的权值系数,构造加权混合孤立森林Whiforest模型,包括:
    步骤1:以历史数据集中的训练集的数据构建孤立树集合iforest中的各孤立树,包括设定参数bootstrap采样数ψ、森林规模大小T、权值系数阈值μ、验证样本集Val_W的大小和已知异常样本添加率ratio;
    步骤2:根据已知异常样本添加率ratio随机选取已知异常样本加入到iforest中的各孤立树中;
    步骤3:计算每棵树的叶子结点中的训练样本中心Cen-s,以及每个待测样本x在叶节点中与Cen-s间的距离δ(x),将其在森林中的每棵树的均值记作s c(x);
    s c(x)=E(δ(x))
    步骤4:在其叶子结点中计算异常样本中心Cen-a,并计算每个待测样本x在叶节点中与上述的Cen-a间的距离记作δ a(x),并将δ(x)和δ a(x)在所有孤立树中均值的比值记作s a(x);
    Figure PCTCN2019082673-appb-100001
    步骤5:根据历史采集的数据集选取验证样本集Val-W,并使用上述建立好的孤立树集合iforest对其检测,结合集成学习中基分类器多样性的思想,通过不合度量对森林中孤立树间的多样性进行计算,得到一个对角为0的T*T对称矩阵diversity;其中,T为孤立树集合iforest中孤立树的棵数;
    步骤6:对所述diversity矩阵求和,并按森林规模大小T作商得到B index,此刻将 B index值与阈值μ比较,权值设置如下所示;
    Figure PCTCN2019082673-appb-100002
    步骤7:设定B index值大于等于μ的树的权值w1=B index+1;小于μ的树的权值w2=1-B index,对s c(x)和s a(x)变量都乘以w1和w2,以下式计算s c(x)和s a(x):
    s c(x)=W*δ(x)
    δ a(x)=W*δ a(x)
    步骤8:将当前数据窗口内样本的原始Score(x)分值以及目前引入的基于距离的2个分值即{Score,s a(x),s c(x)}进行归一化处理,使用的归一化公式如下所示,
    Figure PCTCN2019082673-appb-100003
    其中s(x)代指上述Score、s a(x)、s c(x)3个分值,
    Figure PCTCN2019082673-appb-100004
    为归一化后的值,最终以下式融合上述3个分值得到最终的窗口样本异常分值s final
    Figure PCTCN2019082673-appb-100005
    步骤9:降序排列s final,根据领域知识或参考原先数据集已知的异常数目比例ratio,得到异常分值最高的数据样本,再和待测数据样本标记对比,计算检测率以及误报率相关评价指标;
    步骤10:若节点检测到数据窗口内有异常样本,则将其所属顺序编号传递到簇头节点,进行下一步的验证或处理。
  4. 根据权利要求3所述的方法,其特征在于,所述步骤4中,若叶节点无异常样本,则其异常样本中心Cen-a记为0。
  5. 根据权利要求3所述的方法,其特征在于,所述步骤6中,对所述diversity矩阵求和为对所述diversity矩阵按列求和。
  6. 根据权利要求3所述的方法,其特征在于,所述步骤1中,孤立树构建终止条件:样本不可再分,即只包含一条数据值或数据样本完全相同或孤立树的深度达到最大值log(ψ),其中ψ为参数bootstrap采样数。
  7. 根据权利要求3所述的方法,其特征在于,所述步骤8中,当前数据窗口内样本的 原始Score(x)分值根据下述公式计算得到:
    Figure PCTCN2019082673-appb-100006
    其中,h(x)表示数据样本x在某棵树上的路径长度,C(ψ)为以采样数ψ构建的Itree的平均搜索路径长度。
  8. 根据权利要求7所述的方法,其特征在于,所述数据样本x在某棵树上的路径长度h(x)=e+C(T.size),C(T.size)是以T.size条数据构建的二叉树的平均路径长度。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1-8任一项所述方法的步骤。
  10. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1-8任一项所述的方法。
PCT/CN2019/082673 2018-06-04 2019-04-15 一种传感网络异常数据检测方法 WO2019233189A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/993,454 US20200374720A1 (en) 2018-06-04 2020-08-14 Method for Detecting Abnormal Data in Sensor Network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810563300.9A CN108777873B (zh) 2018-06-04 2018-06-04 基于加权混合孤立森林的无线传感网络异常数据检测方法
CN201810563300.9 2018-06-04

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/993,454 Continuation US20200374720A1 (en) 2018-06-04 2020-08-14 Method for Detecting Abnormal Data in Sensor Network

Publications (1)

Publication Number Publication Date
WO2019233189A1 true WO2019233189A1 (zh) 2019-12-12

Family

ID=64025705

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082673 WO2019233189A1 (zh) 2018-06-04 2019-04-15 一种传感网络异常数据检测方法

Country Status (3)

Country Link
US (1) US20200374720A1 (zh)
CN (1) CN108777873B (zh)
WO (1) WO2019233189A1 (zh)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325463A (zh) * 2020-02-18 2020-06-23 深圳前海微众银行股份有限公司 数据质量检测方法、装置、设备及计算机可读存储介质
CN111740856A (zh) * 2020-05-07 2020-10-02 北京直真科技股份有限公司 基于异常检测算法的网络通信设备告警采集异常预警方法
CN112667709A (zh) * 2020-12-24 2021-04-16 山东大学 基于Spark的校园卡租借行为检测方法及系统
CN113420652A (zh) * 2021-06-22 2021-09-21 中冶赛迪重庆信息技术有限公司 一种时序信号片段异常识别方法、系统、介质及终端
CN113723477A (zh) * 2021-08-16 2021-11-30 同盾科技有限公司 一种基于孤立森林的跨特征联邦异常数据检测方法
CN114169237A (zh) * 2021-11-30 2022-03-11 南昌大学 结合eemd-lstm及孤立森林算法的电力电缆接头温度异常预警方法
WO2022105502A1 (zh) * 2020-11-23 2022-05-27 歌尔股份有限公司 一种点云数据处理方法和装置
CN114697081A (zh) * 2022-02-28 2022-07-01 国网江苏省电力有限公司淮安供电分公司 基于iec61850 sv报文运行态势模型的入侵检测方法和系统
CN114827211A (zh) * 2022-05-13 2022-07-29 浙江启扬智能科技有限公司 一种物联网节点数据驱动的异常监控区域检测方法
CN114925731A (zh) * 2022-06-06 2022-08-19 华电金沙江上游水电开发有限公司叶巴滩分公司 检测柔性测斜仪监测数据异常值的方法
CN114925196A (zh) * 2022-03-01 2022-08-19 健康云(上海)数字科技有限公司 多层感知网络下糖尿病血检异常值辅助剔除方法
CN115713270A (zh) * 2022-11-28 2023-02-24 之江实验室 一种同行互评异常评分检测及修正方法和装置
CN116718249A (zh) * 2023-08-08 2023-09-08 山东元明晴技术有限公司 一种水利工程液位检测系统
CN114925731B (zh) * 2022-06-06 2024-05-31 华电金沙江上游水电开发有限公司叶巴滩分公司 检测柔性测斜仪监测数据异常值的方法

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108777873B (zh) * 2018-06-04 2021-03-02 江南大学 基于加权混合孤立森林的无线传感网络异常数据检测方法
KR102131922B1 (ko) * 2018-08-29 2020-07-08 국방과학연구소 복수의 주변 디바이스로부터 데이터를 수신하는 방법 및 디바이스
CN109800900A (zh) * 2018-11-23 2019-05-24 南京中新赛克科技有限责任公司 一种将孤立森林算法模块化与可视化的方法
CN109871886B (zh) * 2019-01-28 2023-08-01 平安科技(深圳)有限公司 基于谱聚类的异常点比例优化方法、装置及计算机设备
CN109902721A (zh) * 2019-01-28 2019-06-18 平安科技(深圳)有限公司 异常点检测模型验证方法、装置、计算机设备及存储介质
CN109948704A (zh) * 2019-03-20 2019-06-28 中国银联股份有限公司 一种交易监测方法与装置
CN109948738B (zh) * 2019-04-11 2021-03-09 合肥工业大学 涂装烘干室的能耗异常检测方法、装置
CN110414555B (zh) * 2019-06-20 2023-10-03 创新先进技术有限公司 检测异常样本的方法及装置
CN110536258B (zh) * 2019-08-09 2021-07-16 大连理工大学 一种UASNs中基于孤立森林的信任模型
US11216778B2 (en) * 2019-09-30 2022-01-04 EMC IP Holding Company LLC Automatic detection of disruptive orders for a supply chain
CN110958222A (zh) * 2019-10-31 2020-04-03 苏州浪潮智能科技有限公司 基于孤立森林算法的服务器日志异常检测方法及系统
CN110933080B (zh) * 2019-11-29 2021-10-26 上海观安信息技术股份有限公司 一种用户登录异常的ip群体识别方法及装置
CN113032774A (zh) * 2019-12-25 2021-06-25 中移动信息技术有限公司 异常检测模型的训练方法、装置、设备及计算机存储介质
CN111160647B (zh) * 2019-12-30 2023-08-22 第四范式(北京)技术有限公司 一种洗钱行为预测方法及装置
CN111340075B (zh) * 2020-02-14 2021-05-14 北京邮电大学 一种ics的网络数据检测方法及装置
CN111314910B (zh) * 2020-02-25 2022-07-15 重庆邮电大学 一种映射隔离森林的无线传感器网络异常数据检测方法
CN111275547B (zh) * 2020-03-19 2023-07-18 重庆富民银行股份有限公司 基于孤立森林的风控系统及方法
CN111353890A (zh) * 2020-03-30 2020-06-30 中国工商银行股份有限公司 基于应用日志的应用异常检测方法及装置
CN111669368B (zh) * 2020-05-07 2022-12-06 宜通世纪科技股份有限公司 端到端网络感知异常检测及分析方法、系统、装置和介质
CN111666169B (zh) * 2020-05-13 2023-03-28 云南电网有限责任公司信息中心 一种基于改进的孤立森林算法和高斯分布的联合数据异常检测方法
CN111666276A (zh) * 2020-06-11 2020-09-15 上海积成能源科技有限公司 一种电力负荷预测中应用孤立森林算法剔除异常数据处理的方法
CN111967616B (zh) * 2020-08-18 2024-04-23 深延科技(北京)有限公司 自动时间序列回归方法和装置
CN112181706B (zh) * 2020-10-23 2023-09-22 北京邮电大学 一种基于对数区间隔离的电力调度数据异常检测方法
CN112733897A (zh) * 2020-12-30 2021-04-30 胜斗士(上海)科技技术发展有限公司 确定多维样本数据的异常原因的方法和设备
CN112906744B (zh) * 2021-01-20 2023-08-04 湖北工业大学 一种基于孤立森林算法的故障单体电池识别方法
CN113033084B (zh) * 2021-03-11 2022-04-05 哈尔滨工程大学 一种基于孤立森林和滑动时窗的核电站系统在线监测方法
CN112948145B (zh) * 2021-03-16 2023-06-20 河海大学 一种面向水文传感器流数据的异常检测方法
CN113011325B (zh) * 2021-03-18 2022-05-03 重庆交通大学 一种基于孤立森林算法的堆垛机轨道损伤定位方法
CN112990330B (zh) * 2021-03-26 2022-09-20 国网河北省电力有限公司营销服务中心 用户用能异常数据检测方法及设备
CN113204542B (zh) * 2021-04-22 2023-08-22 武汉大学 一种异常用电样本清洗及行为识别方法
CN113327172A (zh) * 2021-05-07 2021-08-31 河南工业大学 一种基于孤立森林的粮情数据离群点检测方法
CN113347565B (zh) * 2021-06-02 2022-11-01 郑州轻工业大学 各向异性无线传感器网络的扩展区域多跳节点测距方法
CN113392914B (zh) * 2021-06-22 2023-04-25 北京邮电大学 一种基于数据特征的权重来构建孤立森林的异常检测算法
CN113537321B (zh) * 2021-07-01 2023-06-30 汕头大学 一种基于孤立森林和x均值的网络流量异常检测方法
CN113721000B (zh) * 2021-07-16 2023-02-03 国家电网有限公司大数据中心 一种变压器油中溶解气体异常检测方法和系统
CN113645098B (zh) * 2021-08-11 2022-08-09 安徽大学 一种无监督的基于增量学习的动态物联网异常检测方法
CN113626607B (zh) * 2021-09-17 2023-08-25 平安银行股份有限公司 异常工单识别方法、装置、电子设备及可读存储介质
CN114065957B (zh) * 2021-10-13 2023-12-05 浙江富日进材料科技有限公司 一种基于wsn的设备监控方法、系统及可读介质
CN113965384B (zh) * 2021-10-22 2023-11-03 上海观安信息技术股份有限公司 一种网络安全异常检测方法、装置及计算机存储介质
CN113992718B (zh) * 2021-10-28 2022-10-04 安徽农业大学 一种基于动态宽度图神经网络的群体传感器异常数据检测方法和系统
CN113822379B (zh) * 2021-11-22 2022-02-22 成都数联云算科技有限公司 工艺制程异常分析方法、装置、电子设备及存储介质
CN114398633A (zh) * 2021-12-29 2022-04-26 北京永信至诚科技股份有限公司 一种蜜罐攻击者的画像分析方法及装置
CN114338195A (zh) * 2021-12-30 2022-04-12 中国电信股份有限公司 基于改进孤立森林算法的web流量异常检测方法及装置
CN114547970B (zh) * 2022-01-25 2024-02-20 中国长江三峡集团有限公司 一种水电厂顶盖排水系统异常智能诊断方法
CN114707571B (zh) * 2022-02-24 2024-05-07 南京审计大学 基于增强隔离森林的信用数据异常检测方法
CN114611616B (zh) * 2022-03-16 2023-02-07 吕少岚 一种基于集成孤立森林的无人机智能故障检测方法及系统
CN114793205A (zh) * 2022-04-25 2022-07-26 咪咕文化科技有限公司 异常链路检测方法、装置、设备及存储介质
CN115080965B (zh) * 2022-08-16 2022-11-15 杭州比智科技有限公司 基于历史表现的无监督异常检测方法及系统
CN115563616B (zh) * 2022-08-19 2024-04-16 广州大学 一种面向本地化差分隐私数据投毒攻击的防御方法
CN115840924B (zh) * 2023-02-15 2023-04-28 深圳市特安电子有限公司 一种压力变送器测量数据智慧处理系统
CN116596336B (zh) * 2023-05-16 2023-10-31 合肥联宝信息技术有限公司 电子设备的状态评估方法、装置、电子设备及存储介质
CN116823816B (zh) * 2023-08-28 2023-11-21 济南正邦电子科技有限公司 一种基于安防监控静态存储器的检测设备及检测方法
CN116827971B (zh) * 2023-08-29 2023-11-24 北京国网信通埃森哲信息技术有限公司 基于区块链的碳排放数据存储与传输方法、装置与设备
CN116911806B (zh) * 2023-09-11 2023-11-28 湖北华中电力科技开发有限责任公司 基于互联网+的电力企业能源信息管理系统
CN117007135B (zh) * 2023-10-07 2023-12-12 东莞百舜机器人技术有限公司 一种基于物联网数据的液压风扇自动组装线监测系统
CN117113235B (zh) * 2023-10-20 2024-01-26 深圳市互盟科技股份有限公司 一种云计算数据中心能耗优化方法及系统
CN117235647B (zh) * 2023-11-03 2024-03-08 中色紫金地质勘查(北京)有限责任公司 基于边缘计算的矿产资源勘查业务hse数据管理方法
CN117241306B (zh) * 2023-11-10 2024-02-06 深圳市银尔达电子有限公司 一种4g网络异常流量数据实时监测方法
CN117272209B (zh) * 2023-11-20 2024-02-02 江苏新希望生态科技有限公司 一种芽苗菜生长数据采集方法及系统
CN117272192B (zh) * 2023-11-22 2024-03-08 青岛洛克环保科技有限公司 基于污水检测的磁混凝高效沉淀池污水处理系统
CN117289778B (zh) * 2023-11-27 2024-03-26 惠州市鑫晖源科技有限公司 一种工控主机电源健康状态的实时监测方法
CN117332283B (zh) * 2023-12-01 2024-03-05 山东康源堂药业股份有限公司 一种中药材生长信息采集分析方法及系统
CN117650971A (zh) * 2023-12-04 2024-03-05 武汉烽火技术服务有限公司 一种通信系统设备故障预防的方法和装置
CN117407734B (zh) * 2023-12-14 2024-03-12 苏州德费尔自动化设备有限公司 一种气缸密封性检测方法及系统
CN117436005B (zh) * 2023-12-21 2024-03-15 山东汇力环保科技有限公司 一种环境空气自动监测过程中异常数据处理方法
CN117556714B (zh) * 2024-01-12 2024-03-22 济南海德热工有限公司 一种用于铝金属冶炼的预热管路温度数据异常分析方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107426207A (zh) * 2017-07-21 2017-12-01 哈尔滨工程大学 一种基于SA‑iForest的网络入侵异常检测方法
CN107451600A (zh) * 2017-07-03 2017-12-08 重庆大学 一种基于隔离机制的在线光伏热斑故障检测方法
CN107657288A (zh) * 2017-10-26 2018-02-02 国网冀北电力有限公司 一种基于孤立森林算法的电力调度流数据异常检测方法
CN108777873A (zh) * 2018-06-04 2018-11-09 江南大学 基于加权混合孤立森林的无线传感网络异常数据检测方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682685B (zh) * 2016-12-06 2020-05-01 重庆大学 基于微波加热温度场分布特征深度学习的局部温度变化异常检测方法
CN107172104B (zh) * 2017-07-17 2019-12-27 顺丰科技有限公司 一种登录异常检测方法、系统及设备
CN107292350A (zh) * 2017-08-04 2017-10-24 电子科技大学 大规模数据的异常检测方法
CN107992741B (zh) * 2017-10-24 2020-08-28 阿里巴巴集团控股有限公司 一种模型训练方法、检测url的方法及装置
CN107909225A (zh) * 2017-12-12 2018-04-13 链家网(北京)科技有限公司 一种房产交易中的贷款放款时长预测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451600A (zh) * 2017-07-03 2017-12-08 重庆大学 一种基于隔离机制的在线光伏热斑故障检测方法
CN107426207A (zh) * 2017-07-21 2017-12-01 哈尔滨工程大学 一种基于SA‑iForest的网络入侵异常检测方法
CN107657288A (zh) * 2017-10-26 2018-02-02 国网冀北电力有限公司 一种基于孤立森林算法的电力调度流数据异常检测方法
CN108777873A (zh) * 2018-06-04 2018-11-09 江南大学 基于加权混合孤立森林的无线传感网络异常数据检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XU DONG ET AL: "An Improved Data Anomaly Detection Method Based on Isolation Forest", 2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID),, vol. 2, 9 December 2017 (2017-12-09), pages 287 - 291, XP033316881, DOI: 10.1109/ISCID.2017.202 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325463A (zh) * 2020-02-18 2020-06-23 深圳前海微众银行股份有限公司 数据质量检测方法、装置、设备及计算机可读存储介质
CN111740856B (zh) * 2020-05-07 2023-04-28 北京直真科技股份有限公司 基于异常检测算法的网络通信设备告警采集异常预警方法
CN111740856A (zh) * 2020-05-07 2020-10-02 北京直真科技股份有限公司 基于异常检测算法的网络通信设备告警采集异常预警方法
WO2022105502A1 (zh) * 2020-11-23 2022-05-27 歌尔股份有限公司 一种点云数据处理方法和装置
CN112667709A (zh) * 2020-12-24 2021-04-16 山东大学 基于Spark的校园卡租借行为检测方法及系统
CN112667709B (zh) * 2020-12-24 2022-05-03 山东大学 基于Spark的校园卡租借行为检测方法及系统
CN113420652A (zh) * 2021-06-22 2021-09-21 中冶赛迪重庆信息技术有限公司 一种时序信号片段异常识别方法、系统、介质及终端
CN113420652B (zh) * 2021-06-22 2023-07-14 中冶赛迪信息技术(重庆)有限公司 一种时序信号片段异常识别方法、系统、介质及终端
CN113723477A (zh) * 2021-08-16 2021-11-30 同盾科技有限公司 一种基于孤立森林的跨特征联邦异常数据检测方法
CN113723477B (zh) * 2021-08-16 2024-04-30 同盾科技有限公司 一种基于孤立森林的跨特征联邦异常数据检测方法
CN114169237A (zh) * 2021-11-30 2022-03-11 南昌大学 结合eemd-lstm及孤立森林算法的电力电缆接头温度异常预警方法
CN114169237B (zh) * 2021-11-30 2024-05-03 南昌大学 结合eemd-lstm及孤立森林算法的电力电缆接头温度异常预警方法
CN114697081A (zh) * 2022-02-28 2022-07-01 国网江苏省电力有限公司淮安供电分公司 基于iec61850 sv报文运行态势模型的入侵检测方法和系统
CN114697081B (zh) * 2022-02-28 2024-05-07 国网江苏省电力有限公司淮安供电分公司 基于iec61850 sv报文运行态势模型的入侵检测方法和系统
CN114925196B (zh) * 2022-03-01 2024-05-21 健康云(上海)数字科技有限公司 多层感知网络下糖尿病血检异常值辅助剔除方法
CN114925196A (zh) * 2022-03-01 2022-08-19 健康云(上海)数字科技有限公司 多层感知网络下糖尿病血检异常值辅助剔除方法
CN114827211B (zh) * 2022-05-13 2023-12-29 浙江启扬智能科技有限公司 一种物联网节点数据驱动的异常监控区域检测方法
CN114827211A (zh) * 2022-05-13 2022-07-29 浙江启扬智能科技有限公司 一种物联网节点数据驱动的异常监控区域检测方法
CN114925731A (zh) * 2022-06-06 2022-08-19 华电金沙江上游水电开发有限公司叶巴滩分公司 检测柔性测斜仪监测数据异常值的方法
CN114925731B (zh) * 2022-06-06 2024-05-31 华电金沙江上游水电开发有限公司叶巴滩分公司 检测柔性测斜仪监测数据异常值的方法
US11989167B1 (en) 2022-11-28 2024-05-21 Zhejiang Lab Method and device for detecting and correcting abnormal scoring of peer reviews
CN115713270A (zh) * 2022-11-28 2023-02-24 之江实验室 一种同行互评异常评分检测及修正方法和装置
CN116718249A (zh) * 2023-08-08 2023-09-08 山东元明晴技术有限公司 一种水利工程液位检测系统

Also Published As

Publication number Publication date
CN108777873B (zh) 2021-03-02
US20200374720A1 (en) 2020-11-26
CN108777873A (zh) 2018-11-09

Similar Documents

Publication Publication Date Title
WO2019233189A1 (zh) 一种传感网络异常数据检测方法
Ij Statistics versus machine learning
TWI776010B (zh) 用於減少光譜分類的誤報識別之設備和方法以及相關的非暫時性電腦可讀取媒體
CN105279397A (zh) 一种识别蛋白质相互作用网络中关键蛋白质的方法
CN109273096A (zh) 一种基于机器学习的药品风险分级评估方法
CN108833139B (zh) 一种基于类别属性划分的ossec报警数据聚合方法
WO2018184304A1 (zh) 一种网元健康状态的检测方法及设备
CN111950645A (zh) 一种通过改进随机森林提高类不平衡分类性能的方法
CN108460462A (zh) 一种基于区间参数优化的区间神经网络学习方法
CN111601358B (zh) 一种多阶段分层分簇空间相关性温度感知数据去冗余方法
CN104715160A (zh) 基于kmdb的软测量建模数据异常点检测方法
CN116564409A (zh) 基于机器学习的转移性乳腺癌转录组测序数据识别方法
CN115208651B (zh) 基于逆习惯化机制的流聚类异常检测方法及系统
CN111711530A (zh) 基于社区拓扑结构信息的链接预测算法
Wang et al. Network intrusion detection using support vector machine based on particle swarm optimization
Saadati et al. Analysing first birth interval by a CART survival tree
CN112597699A (zh) 一种融入客观赋权法的社交网络谣言源识别方法
CN117235434B (zh) 林业碳汇项目基线构建方法、系统、终端及介质
CN110688287A (zh) 一种基于改进概率神经网络的工控网络态势评估方法
Assiroj et al. Comparing CART and C5. 0 algorithm performance of human development index
CN113035363B (zh) 一种概率密度加权的遗传代谢病筛查数据混合采样方法
CN112884167B (zh) 一种基于机器学习的多指标异常检测方法及其应用系统
CN117497198B (zh) 一种高维医学数据特征子集筛选方法
CN116365519B (zh) 一种电力负荷预测方法、系统、存储介质及设备
CN116776134B (zh) 一种基于PCA-SFFS-BiGRU的光伏出力预测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19815168

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19815168

Country of ref document: EP

Kind code of ref document: A1