CN110086860B - Data anomaly detection method and device under Internet of things big data environment - Google Patents
Data anomaly detection method and device under Internet of things big data environment Download PDFInfo
- Publication number
- CN110086860B CN110086860B CN201910318526.7A CN201910318526A CN110086860B CN 110086860 B CN110086860 B CN 110086860B CN 201910318526 A CN201910318526 A CN 201910318526A CN 110086860 B CN110086860 B CN 110086860B
- Authority
- CN
- China
- Prior art keywords
- context
- neighborhood
- probability matrix
- equipment
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 83
- 239000011159 matrix material Substances 0.000 claims abstract description 126
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 230000002159 abnormal effect Effects 0.000 claims abstract description 24
- 230000008569 process Effects 0.000 claims abstract description 24
- 230000006399 behavior Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 230000006855 networking Effects 0.000 claims 1
- 230000002045 lasting effect Effects 0.000 abstract 1
- 230000005856 abnormality Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 206010000117 Abnormal behaviour Diseases 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- General Engineering & Computer Science (AREA)
- Environmental & Geological Engineering (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computer And Data Communications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
技术领域technical field
本发明涉及信息安全技术领域,具体涉及一种物联网大数据环境下的数据异常检测方法及装置。The invention relates to the technical field of information security, in particular to a data abnormality detection method and device in the big data environment of the Internet of Things.
背景技术Background technique
物联网是实现人物互联、物物互联的系统,随着网络科技的高速发展,物联网系统已经应用到各种基础设施中,为社会提供广泛的服务。近年来涌现出了大量以物联网为数据来源,以大数据为分析对象,以人工智能为技术手段的新型物联网大数据分析平台。这就要求物联网产生的来源数据具有较高的可信性,否则将影响之后大数据分析结果的准确性,造成严重的后果,故需要对物联网大数据进行异常检测,提高数据的质量。The Internet of Things is a system that realizes the interconnection of people and things. With the rapid development of network technology, the Internet of Things system has been applied to various infrastructures to provide a wide range of services for the society. In recent years, a large number of new IoT big data analysis platforms have emerged that use the Internet of Things as the data source, big data as the analysis object, and artificial intelligence as the technical means. This requires the source data generated by the Internet of Things to have high credibility, otherwise it will affect the accuracy of subsequent big data analysis results and cause serious consequences.
现有技术中,常用的物联网大数据异常检测技术主要有三种模式:马尔科夫异常检测器、基于滑动窗口的异常检测技术、上下文感知异常检测技术。In the prior art, commonly used IoT big data anomaly detection technologies mainly have three modes: Markov anomaly detector, anomaly detection technology based on sliding window, and context-aware anomaly detection technology.
本申请发明人在实施本发明的过程中,发现现有技术的方法,至少存在如下技术问题:In the process of implementing the present invention, the inventor of the present application found that the method of the prior art has at least the following technical problems:
马尔科夫异常检测器是以马尔科夫状态转移矩阵为基础来对设备进行异常检测,它认为当前状态仅与上一状态有关而与再之前的任一状态都无关。马尔科夫异常检测器训练模型的过程简单,但是检测过程计算复杂度较高,且其特性使其只能关注某一时刻的异常突变,而物联网数据出现问题往往是持续一段时间的异常事件,这就会导致其检测正确率较低。The Markov anomaly detector is based on the Markov state transition matrix to detect the anomaly of the device. It considers that the current state is only related to the previous state and has nothing to do with any previous state. The process of training the model of Markov anomaly detector is simple, but the computational complexity of the detection process is high, and its characteristics make it only focus on abnormal mutations at a certain moment, and the problems of IoT data are often abnormal events that last for a period of time. , which will result in a lower detection accuracy.
在引入了滑动窗口的概念后,大量研究者针对不同设备提出了一些基于滑动窗口的异常检测方法。滑动窗口能够关注一段时间内物联网数据的行为信息,能够有效应对持续一段时间的数据异常情况,在一定程度上提升检测的准确率。但是此类方法只能针对较为简单的嵌入式设备,而物联网设备的功能越来越复杂,有多种行为模式,传统的基于滑动窗口的异常检测方法在检测这类物联网设备时正确率会大幅下降,不能应对功能越来越强大的物联网场景。After introducing the concept of sliding windows, a large number of researchers have proposed some anomaly detection methods based on sliding windows for different devices. The sliding window can pay attention to the behavior information of the IoT data for a period of time, can effectively deal with the abnormal situation of the data that lasts for a period of time, and improve the detection accuracy to a certain extent. However, such methods can only be aimed at relatively simple embedded devices, and the functions of IoT devices are becoming more and more complex, and there are various behavior modes. The traditional anomaly detection methods based on sliding windows are accurate in detecting such IoT devices. It will drop significantly and cannot cope with the increasingly powerful IoT scenarios.
为了检测有多种行为模式的物联网设备,有一些方法在滑动窗口异常检测技术上作了改进,增加了上下文检测模块,先检测物联网设备的上下文属性,即设备所处的物理环境,然后根据不同的上下文属性加载不同的异常检测模型来进行检测。上下文感知异常检测技术虽然在一定程度上能够检测具有上下文属性的物联网数据的异常,但是当前方法都是针对单个设备的检测,且是用物联网数据本身来进行上下文属性的判断,若数据本身已是不可信的,则整个检测过程都将是不可信的,导致检测结果准确性不高。In order to detect IoT devices with multiple behavior patterns, some methods have improved the sliding window anomaly detection technology and added a context detection module. Load different anomaly detection models for detection according to different context attributes. Although context-aware anomaly detection technology can detect anomalies of IoT data with contextual attributes to a certain extent, the current methods are all for the detection of a single device, and use the IoT data itself to judge the contextual attributes. If it is unreliable, the entire detection process will be unreliable, resulting in low accuracy of the detection results.
由此可知,现有技术中的方法存在准确性不高的技术问题。From this, it can be seen that the methods in the prior art have a technical problem of low accuracy.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明提供了一种物联网大数据环境下的数据异常检测方法及装置,用以解决或者至少部分解决现有技术中的方法存在准确性不高的技术问题。In view of this, the present invention provides a data anomaly detection method and device in the big data environment of the Internet of Things to solve or at least partially solve the technical problem of low accuracy in the methods in the prior art.
本发明第一方面提供了一种物联网大数据环境下的数据异常检测方法,包括:A first aspect of the present invention provides a data anomaly detection method under the big data environment of the Internet of Things, including:
步骤S1:将需检测的物联网设备进行分类,为每类设备定义上下文属性,其中,每一个上下文属性对应该类设备的一种行为模式;Step S1: classify the IoT devices to be detected, and define a context attribute for each type of device, wherein each context attribute corresponds to a behavior mode of the corresponding type of device;
步骤S2:采集每类设备在每个上下文属性中运行时产生的物联网数据,利用预设概率矩阵器算法计算与每种设备对应的上下文属性的概率矩阵,用以提取每类设备的特征;Step S2: Collect the IoT data generated when each type of device runs in each context attribute, and use a preset probability matrix algorithm to calculate the probability matrix of the context attribute corresponding to each type of device, so as to extract the characteristics of each type of device;
步骤S3:根据所有种类设备的所有上下文属性,得到所有设备处于邻域范围内所共享的上下文,将其作为邻域共享上下文;Step S3: according to all context attributes of all types of devices, obtain the context shared by all devices within the neighborhood scope, and use it as the neighborhood shared context;
步骤S4:根据所有设备处于邻域范围内所共享的上下文,形成邻域-设备上下文对应表,其中,所述对应表中包括概率矩阵;Step S4: forming a neighborhood-device context correspondence table according to the context shared by all devices in the neighborhood, wherein the correspondence table includes a probability matrix;
步骤S5:计算待检测设备所处的邻域处于每个邻域共享上下文的可能性,并基于计算出的可能性情况,确定与该邻域对应的目标邻域共享上下文;Step S5: Calculate the possibility that the neighborhood where the device to be detected is located is in each neighborhood shared context, and determine the target neighborhood shared context corresponding to the neighborhood based on the calculated possibility;
步骤S6:根据确定出的目标邻域共享上下文,从邻域-设备上下文对应表中,加载与待检测设备相应的概率矩阵;Step S6: according to the determined target neighborhood shared context, load a probability matrix corresponding to the device to be detected from the neighborhood-device context correspondence table;
步骤S7:基于加载的概率矩阵采用预设概率矩阵器算法对每个待检测设备的数据进行异常检测。Step S7: Based on the loaded probability matrix, a preset probability matrix algorithm is used to perform anomaly detection on the data of each device to be detected.
在一种实施方式中,步骤S2具体包括:In one embodiment, step S2 specifically includes:
将不同种类设备的不同上下文分开独立进行训练;Separate and independently train different contexts of different types of devices;
通过数据采集器采集所有情况的正常运行数据,根据其值域划分为不多于10个的分段,其中,代表数据所属分段的符号用以表示一个数据;The normal operation data of all situations is collected by the data collector, and divided into no more than 10 segments according to its value range, wherein the symbol representing the segment to which the data belongs is used to represent a data;
将采集的数据形成以时间为维度的序列,并转化为符号序列,定义一个固定大小为n的滑动窗口W,使其按时间流动方向移动;The collected data is formed into a sequence with time as the dimension, and converted into a sequence of symbols, and a sliding window W with a fixed size n is defined to make it move in the direction of time flow;
每一时刻,滑动窗口中存在一个长度为n的字符序列,统计其中距离为1到n-1的双字符对的个数,并创建一个以双字符对的种类行为,以相邻距离1到n-1为列的特征矩阵,在特征矩阵记录滑动窗口在移动过程中统计的个数;At each moment, there is a character sequence of length n in the sliding window, count the number of double-character pairs with distances from 1 to n-1, and create a kind of double-character behavior with adjacent distances from 1 to 1. n-1 is the feature matrix of the column, and the number of statistics of the sliding window during the moving process is recorded in the feature matrix;
对于上述计算方式得到的矩阵按列进行归一化,得到此距离下每个字符对出现的概率,将其作为对应的概率矩阵。The matrix obtained by the above calculation method is normalized by column, and the probability of each character pair appearing under this distance is obtained, which is used as the corresponding probability matrix.
在一种实施方式中,步骤S3具体包括:In one embodiment, step S3 specifically includes:
综合分析所有的上下文属性,将无关联的上下文直接联合,将有关联的上下文约简重复部分后再联合,形成包含所有设备所有上下文的邻域共享上下文。Comprehensively analyze all the context attributes, directly combine the unrelated contexts, reduce the repeated parts of the related contexts and then combine them to form a neighborhood shared context that includes all the contexts of all devices.
在一种实施方式中,步骤S5具体包括:In one embodiment, step S5 specifically includes:
根据邻域-设备上下文对应表,采用预设概率矩阵器算法计算待检测设备所处的邻域处于每个邻域共享上下文的可能性;According to the neighborhood-device context correspondence table, a preset probability matrix algorithm is used to calculate the possibility that the neighborhood where the device to be detected is located is in the shared context of each neighborhood;
并将可能性最大的上下文作为目标邻域共享上下文。And take the most likely context as the target neighborhood to share the context.
在一种实施方式中,待检测设备所处的邻域处于每个邻域共享上下文的可能性的计算方式,包括:In one embodiment, the calculation method of the possibility that the neighborhood where the device to be detected is located is in the shared context of each neighborhood includes:
min{P(i)}min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…P(i)=a*D(M A ,S Ai )+b*D(M B ,S Bi )+c*D(M C ,S Ci )+…
其中,i表示邻域共享上下文序号,a、b、c表示该邻域设备A、设备B、设备C的数量,D表示计算两个矩阵之间欧式距离的函数,SAi表示邻域共享上下文属性i对应的设备A的概率矩阵,SBi邻域共享上下文属性i对应的设备B的概率矩阵,SCi表示邻域共享上下文属性i对应的设备C的概率矩阵,MA表示所有A类设备的概率矩阵的平均值,MB表示所有B类设备的概率矩阵的平均值,MC表示所有C类设备的概率矩阵的平均值。Among them, i represents the neighborhood sharing context number, a, b, c represent the number of devices A, B, and C in the neighborhood, D represents the function to calculate the Euclidean distance between two matrices, and S Ai represents the neighborhood sharing context The probability matrix of device A corresponding to attribute i, S Bi the probability matrix of device B corresponding to the neighborhood shared context attribute i, S Ci represents the probability matrix of device C corresponding to the neighborhood shared context attribute i, M A represents all A-type devices The average value of the probability matrix of , M B represents the average value of the probability matrix of all B-type devices, and M C represents the average value of the probability matrix of all C-type devices.
在一种实施方式中,步骤S7具体包括:In one embodiment, step S7 specifically includes:
将待检测设备的数据转化为字符序列后,定义相等大小n的滑动窗口W,计算滑动窗口中的字符序列出现的概率;After the data of the device to be detected is converted into a character sequence, a sliding window W of equal size n is defined, and the probability of occurrence of the character sequence in the sliding window is calculated;
将计算的概率值与设定的阈值p进行比较,若小于阈值则标记为非正常,继续滑动窗口,如果连续出现了k个非正常的时刻,则检测出该时刻数据为异常。The calculated probability value is compared with the set threshold p. If it is smaller than the threshold value, it is marked as abnormal, and the window continues to slide. If k abnormal moments appear continuously, the data at this moment is detected as abnormal.
在一种实施方式中,在进行异常检测时,每隔预设周期执行确定目标邻域共享上下文的步骤。In one embodiment, when performing anomaly detection, the step of determining the shared context of the target neighborhood is performed every preset period.
基于同样的发明构思,本发明第二方面提供了一种物联网大数据环境下的数据异常检测装置,包括:Based on the same inventive concept, the second aspect of the present invention provides a data anomaly detection device in the big data environment of the Internet of Things, including:
上下文属性定义模块,用于将需检测的物联网设备进行分类,为每类设备定义上下文属性,其中,每一个上下文属性对应该类设备的一种行为模式;The context attribute definition module is used to classify the IoT devices to be detected, and define a context attribute for each type of device, wherein each context attribute corresponds to a behavior mode of the corresponding type of device;
设备特征提取模块,用于采集每类设备在每个上下文属性中运行时产生的物联网数据,利用预设概率矩阵器算法计算与每种设备对应的上下文属性的概率矩阵,用以提取每类设备的特征;The device feature extraction module is used to collect the IoT data generated when each type of device runs in each context attribute, and use the preset probability matrix algorithm to calculate the probability matrix of the context attribute corresponding to each device to extract each type of device. characteristics of the device;
邻域共享上下文获得模块,用于根据所有种类设备的所有上下文属性,得到所有设备处于邻域范围内所共享的上下文,将其作为邻域共享上下文;The neighborhood sharing context obtaining module is used to obtain the context shared by all devices in the neighborhood according to all the context attributes of all kinds of devices, and use it as the neighborhood sharing context;
对应表形成模块,用于根据所有设备处于邻域范围内所共享的上下文,形成邻域-设备上下文对应表,其中,所述对应表中包括概率矩阵;a correspondence table forming module, configured to form a neighborhood-device context correspondence table according to the context shared by all devices in the neighborhood, wherein the correspondence table includes a probability matrix;
目标邻域共享上下文确定模块,用于计算待检测设备所处的邻域处于每个邻域共享上下文的可能性,并基于计算出的可能性情况,确定与该邻域对应的目标邻域共享上下文;The target neighborhood shared context determination module is used to calculate the possibility that the neighborhood where the device to be detected is located is in each neighborhood shared context, and based on the calculated possibility, determine the target neighborhood shared corresponding to the neighborhood context;
概率矩阵加载模块,用于根据确定出的目标邻域共享上下文,从邻域-设备上下文对应表中,加载与待检测设备相应的概率矩阵;The probability matrix loading module is used to load the probability matrix corresponding to the device to be detected from the neighborhood-device context correspondence table according to the determined target neighborhood sharing context;
异常检测模块,用于基于加载的概率矩阵采用预设概率矩阵器算法对每个待检测设备的数据进行异常检测。The anomaly detection module is used to perform anomaly detection on the data of each device to be detected by using a preset probability matrix algorithm based on the loaded probability matrix.
基于同样的发明构思,本发明第三方面提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被执行时实现第一方面所述的方法。Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the program is executed, the method described in the first aspect is implemented.
基于同样的发明构思,本发明第四方面提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如第一方面所述的方法。Based on the same inventive concept, a fourth aspect of the present invention provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implements the following when executing the program. The method described in the first aspect.
本申请实施例中的上述一个或多个技术方案,至少具有如下一种或多种技术效果:The above-mentioned one or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
本发明提供的一种物联网大数据环境下的数据异常检测方法,首先将需检测的物联网设备进行分类,为每类设备定义上下文属性;并采集每类设备在每个上下文属性中运行时产生的物联网数据,利用预设概率矩阵器算法计算与每种设备对应的上下文属性的概率矩阵;然后根据所有种类设备的所有上下文属性,得到所有设备处于邻域范围内所共享的上下文;接着根据所有设备处于邻域范围内所共享的上下文,形成邻域-设备上下文对应表;接下来计算待检测设备所处的邻域处于每个邻域共享上下文的可能性,并基于计算出的可能性情况,确定与该邻域对应的目标邻域共享上下文,再根据确定出的目标邻域共享上下文,从邻域-设备上下文对应表中,加载与待检测设备相应的概率矩阵;最后基于加载的概率矩阵采用预设概率矩阵器算法对每个待检测设备的数据进行异常检测。The invention provides a data abnormality detection method under the big data environment of the Internet of Things. First, the Internet of Things devices to be detected are classified, and context attributes are defined for each type of equipment; The generated IoT data uses the preset probability matrix algorithm to calculate the probability matrix of the context attributes corresponding to each device; then, according to all the context attributes of all types of devices, the context shared by all devices in the neighborhood is obtained; then According to the context shared by all devices in the neighborhood, a neighborhood-device context correspondence table is formed; next, the possibility that the neighborhood where the device to be detected is located is in the shared context of each neighborhood is calculated, and based on the calculated possibility Determine the target neighborhood sharing context corresponding to the neighborhood, and then load the probability matrix corresponding to the device to be detected from the neighborhood-device context correspondence table according to the determined target neighborhood sharing context; finally, based on the loading The probability matrix of the device uses a preset probability matrix algorithm to perform abnormal detection on the data of each device to be detected.
相对于现有的方法而言,本发明在物联网数据行为异常检测过程,首先确定该设备当前数据是在哪一个上下文环境中产生的,在确定上下文后,从预先构建的邻域-设备上下文对应表中选取出与该设备与该上下文对应的概率矩阵,即本发明是一种基于邻域共享上下文和数据行为的可信性判定方法,能够判断存在多类设备的邻域范围内的上下文属性,能够对具有多种行为模式的物联网设备产生的数据进行检测,具有异常识别率高,检测正确率高,检测过程计算复杂度低的特点。解决了现有技术中的方法存在准确性不高的技术问题。Compared with the existing methods, in the process of detecting abnormal behavior of data in the Internet of Things, the present invention firstly determines in which context the current data of the device is generated, and after determining the context, from the pre-built neighborhood-device context. The probability matrix corresponding to the device and the context is selected from the correspondence table, that is, the present invention is a reliability determination method based on the neighborhood sharing context and data behavior, and can determine the context within the neighborhood range of multiple types of devices. It can detect data generated by IoT devices with various behavior patterns, and has the characteristics of high abnormal recognition rate, high detection accuracy rate, and low computational complexity in the detection process. The technical problem of low accuracy in the method in the prior art is solved.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明一种物联网大数据环境下的数据异常检测方法的流程图;Fig. 1 is a flow chart of a data anomaly detection method in a big data environment of the Internet of Things according to the present invention;
图2本发明实施例的上下文感知异常检测模型的构建流程图;2 is a flow chart of constructing a context-aware anomaly detection model according to an embodiment of the present invention;
图3为一种具体示例中基于邻域共享上下文属性的物联网大数据异常检测流程图的示意图;3 is a schematic diagram of a flowchart of IoT big data anomaly detection based on neighborhood shared context attributes in a specific example;
图4为本发明实施例中一种物联网大数据环境下的数据异常检测装置的结构框图;FIG. 4 is a structural block diagram of a data anomaly detection device in an IoT big data environment according to an embodiment of the present invention;
图5为本发明实施例的物联网大数据异常检测系统框架图;FIG. 5 is a framework diagram of an IoT big data anomaly detection system according to an embodiment of the present invention;
图6为本发明实施例中计算机可读存储介质的结构图;6 is a structural diagram of a computer-readable storage medium in an embodiment of the present invention;
图7为本发明实施例中计算机设备的结构图。FIG. 7 is a structural diagram of a computer device in an embodiment of the present invention.
具体实施方式Detailed ways
本发明的目的在于提供一种物联网大数据环境下的数据异常检测方法及装置,旨在解决物联网大数据架构下物联网感知层易受攻击易出故障,而大数据分析中心分析了错误数据给出错误结果可能造成严重后果的问题。The purpose of the present invention is to provide a data anomaly detection method and device in the big data environment of the Internet of Things, which aims to solve the problem that the perception layer of the Internet of Things is vulnerable to attacks and failures under the big data architecture of the Internet of Things, and the big data analysis center analyzes the error. Data giving wrong results can have serious consequences.
为实现上述目的,本发明提供了一种基于邻域共享上下文属性的物联网设备数据异常检测方法,能够判断存在多类设备的邻域范围内的上下文属性,能够对具有多种行为模式的物联网设备产生的数据进行检测,具有异常识别率高,检测正确率高,检测过程计算复杂度低的特点。In order to achieve the above object, the present invention provides a method for detecting anomaly in IoT device data based on neighborhood shared context attributes, which can determine the context attributes within the neighborhood range of multiple types of devices, and can detect objects with multiple behavior patterns. The detection of data generated by networked devices has the characteristics of high abnormal recognition rate, high detection accuracy rate, and low computational complexity in the detection process.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
实施例一Example 1
本实施例提供了一种物联网大数据环境下的数据异常检测方法,请参见图1,该方法包括:This embodiment provides a data anomaly detection method in an IoT big data environment, as shown in FIG. 1 , the method includes:
步骤S1:将需检测的物联网设备进行分类,为每类设备定义上下文属性,其中,每一个上下文属性对应该类设备的一种行为模式。Step S1: classify the IoT devices to be detected, and define a context attribute for each type of device, wherein each context attribute corresponds to a behavior mode of the corresponding type of device.
具体来说,由于每类物联网设备需要单独操作,因而将需检测的物联网设备分类。并为每种类的设备定义上下文属性,例如,某一种类设备的行为有多种,则具有多种上下文属性。Specifically, since each type of IoT device needs to be operated individually, the IoT devices to be detected are classified. And define context attributes for each type of device. For example, if a certain type of device has multiple behaviors, it has multiple context attributes.
步骤S2:采集每类设备在每个上下文属性中运行时产生的物联网数据,利用预设概率矩阵器算法计算与每种设备对应的上下文属性的概率矩阵,用以提取每类设备的特征。Step S2: Collect IoT data generated when each type of device runs in each context attribute, and use a preset probability matrix algorithm to calculate a probability matrix of context attributes corresponding to each type of device to extract features of each type of device.
具体来说,可以在设备正常运行过程中,通过数据采集器来采集每个上下文属性中运行时产生的物联网数据。预设概率矩阵器算法即为基于滑动窗口的概率检测器算法,与每种设备对应的上下文属性的概率矩阵,即每类设备在每种情况(每种情况对应一种行为,即对应一种上下文属性)下的概率矩阵。概率矩阵即为特征矩阵,用来提取每类设备的特征。Specifically, during the normal operation of the device, the data collector can collect the IoT data generated during operation in each context attribute. The preset probability matrix algorithm is the probability detector algorithm based on the sliding window, and the probability matrix of the context attributes corresponding to each device, that is, each type of device in each situation (each situation corresponds to a behavior, that is, corresponds to a The probability matrix under the context attribute). The probability matrix is the feature matrix, which is used to extract the features of each type of equipment.
步骤S3:根据所有种类设备的所有上下文属性,得到所有设备处于邻域范围内所共享的上下文,将其作为邻域共享上下文。Step S3: According to all context attributes of all types of devices, obtain the contexts shared by all devices in the neighborhood range, and use them as neighborhood shared contexts.
具体来说,可以通过约简合并所有种类设备的所有上下文属性,得到所有设备处于邻域范围内所共享的上下文属性,称为邻域共享上下文。Specifically, all context attributes of all types of devices can be merged through reduction, and the context attributes shared by all devices in the neighborhood range can be obtained, which is called neighborhood shared context.
步骤S4:根据所有设备处于邻域范围内所共享的上下文,形成邻域-设备上下文对应表,其中,所述对应表中包括概率矩阵。Step S4: According to the context shared by all devices in the neighborhood range, a neighborhood-device context correspondence table is formed, wherein the correspondence table includes a probability matrix.
具体来说,邻域-设备上下文对应表的行表示每一种邻域共享上下文,列表示每一类设备,对应表中的内容是概率矩阵。就是说每一种邻域上下文,对应于每类设备的哪个上下文,该对应表存储的就是此设备的此上下文的概率矩阵。Specifically, the rows of the neighborhood-device context correspondence table represent each neighborhood shared context, the columns represent each type of device, and the content in the correspondence table is a probability matrix. That is to say, each neighborhood context corresponds to which context of each type of device, and the correspondence table stores the probability matrix of this context of this device.
本发明的步骤S1~S4即是上下文感知异常检测模型的构建过程,具体参见图2,示出了具体的实现流程,具体包括:设备种类分类、定义上下文属性、根据设备的上下文计算概率矩阵、形成邻域共享上下文属性、形成邻域共享上下文-设备行为概率矩阵对应表(邻域-设备上下文对应表)。Steps S1 to S4 of the present invention are the construction process of the context-aware anomaly detection model. Refer to FIG. 2 for details, which shows the specific implementation process, including: device type classification, definition of context attributes, calculation of probability matrix according to the context of the device, A neighborhood shared context attribute is formed, and a neighborhood shared context-equipment behavior probability matrix correspondence table (neighborhood-equipment context correspondence table) is formed.
步骤S5:计算待检测设备所处的邻域处于每个邻域共享上下文的可能性,并基于计算出的可能性情况,确定与该邻域对应的目标邻域共享上下文。Step S5: Calculate the possibility that the neighborhood where the device to be detected is located is in each neighborhood shared context, and determine the target neighborhood shared context corresponding to the neighborhood based on the calculated possibility.
具体来说,实际检测过程中,所有设备会向异常检测中心不断发送数据包,不同邻域的设备分开进行检测。Specifically, in the actual detection process, all devices will continuously send data packets to the anomaly detection center, and devices in different neighborhoods will be detected separately.
在具体的实施过程中,物联网设备向异常检测中心发送的数据包应包含三个内容:设备种类、所属邻域、物联网数据,即DataPackage=(DeviceType,Area,Data)。DeviceType表示设备种类,异常检测中心针对不同的设备需要加载不同的行为特征模型(概率矩阵)来进行检测;Area表示邻域,同一邻域共享相同的上下文,故异常检测中心需要将同一邻域的数据综合进行分析,不同邻域之间相互独立;Data表示需被检测的数据,是本发明关注的重点,将短时间内产生的数据形成序列发送给异常检测中心,以适应滑动窗口的检测方法。In the specific implementation process, the data packet sent by the IoT device to the anomaly detection center should contain three contents: device type, neighborhood, and IoT data, that is, DataPackage=(DeviceType, Area, Data). DeviceType represents the type of device, and the anomaly detection center needs to load different behavioral feature models (probability matrices) for different devices for detection; Area represents the neighborhood, the same neighborhood shares the same context, so the anomaly detection center needs to load the same neighborhood. The data is comprehensively analyzed, and different neighborhoods are independent of each other; Data represents the data to be detected, which is the focus of the present invention, and the sequence of data generated in a short time is sent to the anomaly detection center to adapt to the sliding window detection method .
具体可以采用前述的预设概率矩阵器算法来进行计算当前情况属于各个邻域共享上下文的可能性,从而确定目标邻域共享上下文。Specifically, the aforementioned preset probability matrix algorithm can be used to calculate the possibility that the current situation belongs to each neighborhood shared context, so as to determine the target neighborhood shared context.
步骤S6:根据确定出的目标邻域共享上下文,从邻域-设备上下文对应表中,加载与待检测设备相应的概率矩阵。Step S6: According to the determined target neighborhood shared context, load a probability matrix corresponding to the device to be detected from the neighborhood-device context correspondence table.
具体来说,本步骤中加载的概率矩阵即步骤S5中确定好的当前环境下每类设备的上下文属性对应的概率矩阵。Specifically, the probability matrix loaded in this step is the probability matrix corresponding to the context attribute of each type of device in the current environment determined in step S5.
步骤S7:基于加载的概率矩阵采用预设概率矩阵器算法对每个待检测设备的数据进行异常检测。Step S7: Based on the loaded probability matrix, a preset probability matrix algorithm is used to perform anomaly detection on the data of each device to be detected.
具体来说,通过对不同的设备所处的环境,采用对应的概率矩阵来进行异常检测。本发明得主要贡献在于利用了物联网系统中邻域设备共享相同上下文属性的特点,能够将同一邻域的所有设备综合进行分析判定共享上下文属性,提升了上下文判定过程的可信性,即提高了准确性。Specifically, anomaly detection is performed by using the corresponding probability matrix for the environment where different devices are located. The main contribution of the present invention is that it utilizes the feature that the neighboring devices in the Internet of Things system share the same context attribute, and can comprehensively analyze and determine the shared context attribute of all the devices in the same neighborhood, which improves the credibility of the context judgment process, that is, improves the accuracy.
在一种实施方式中,步骤S2具体包括:In one embodiment, step S2 specifically includes:
将不同种类设备的不同上下文分开独立进行训练;Separate and independently train different contexts of different types of devices;
通过数据采集器采集所有情况的正常运行数据,根据其值域划分为不多于10个的分段,其中,代表数据所属分段的符号用以表示一个数据;The normal operation data of all situations is collected by the data collector, and divided into no more than 10 segments according to its value range, wherein the symbol representing the segment to which the data belongs is used to represent a data;
将采集的数据形成以时间为维度的序列,并转化为符号序列,定义一个固定大小为n的滑动窗口W,使其按时间流动方向移动;The collected data is formed into a sequence with time as the dimension, and converted into a sequence of symbols, and a sliding window W with a fixed size n is defined to make it move in the direction of time flow;
每一时刻,滑动窗口中存在一个长度为n的字符序列,统计其中距离为1到n-1的双字符对的个数,并创建一个以双字符对的种类行为,以相邻距离1到n-1为列的特征矩阵,在特征矩阵记录滑动窗口在移动过程中统计的个数;At each moment, there is a character sequence of length n in the sliding window, count the number of double-character pairs with distances from 1 to n-1, and create a kind of double-character behavior with adjacent distances from 1 to 1. n-1 is the feature matrix of the column, and the number of statistics of the sliding window during the moving process is recorded in the feature matrix;
对于上述计算方式得到的矩阵按列进行归一化,得到此距离下每个字符对出现的概率,将其作为对应的概率矩阵。The matrix obtained by the above calculation method is normalized by column, and the probability of each character pair appearing under this distance is obtained, which is used as the corresponding probability matrix.
具体来说,每一时刻,滑动窗口中存在一个长度为n的字符序列,统计其中距离为1到n-1的双字符对的个数,即W[1]W[2],W[2]W[3],…,W[n-1]W[n],…,W[1]W[3],W[2]W[4],…,W[n-1]W[n]。其中,归一化处理即每个记录除以列的总和,得到此距离下每个字符对出现的概率,即此类设备在该上下文环境中的概率矩阵S(行为特征),S用于后续的上下文判定和异常检测。Specifically, at each moment, there is a character sequence of length n in the sliding window, and count the number of double character pairs with distances from 1 to n-1, namely W[1]W[2], W[2 ]W[3],...,W[n-1]W[n],...,W[1]W[3],W[2]W[4],...,W[n-1]W[n ]. Among them, the normalization process is that each record is divided by the sum of the columns to obtain the probability of each character pair appearing at this distance, that is, the probability matrix S (behavioral characteristics) of such a device in this context, and S is used for subsequent contextual determination and anomaly detection.
在一种实施方式中,步骤S3具体包括:In one embodiment, step S3 specifically includes:
综合分析所有的上下文属性,将无关联的上下文直接联合,将有关联的上下文约简重复部分后再联合,形成包含所有设备所有上下文的邻域共享上下文。Comprehensively analyze all the context attributes, directly combine the unrelated contexts, reduce the repeated parts of the related contexts and then combine them to form a neighborhood shared context that includes all the contexts of all devices.
具体来说,在异常检测算法的基础上需要先判定上下文属性。根据物联网系统工作和部署的特点,同一邻域范围内的所有设备共享相同的上下文属性,本发明依据此特点来判定邻域共享上下文属性,具体实现过程如下:Specifically, on the basis of anomaly detection algorithm, it is necessary to determine the context attribute first. According to the characteristics of the work and deployment of the Internet of Things system, all devices in the same neighborhood share the same context attribute, and the present invention determines the neighborhood shared context attribute according to this feature, and the specific implementation process is as follows:
预先定义所有种类设备的上下文属性,综合分析所有的上下文属性,将无关联的上下文直接联合,将有关联的上下文约简其重复部分后再联合,形成包含所有设备所有上下文的邻域共享上下文。分析结果形成邻域-设备上下文对应表,例如用矩阵可表示为其中每一行表示邻域共享上下文属性的种类,每一行中的内容表示该邻域共享上下文对应于各类设备上下文的特征,即SA1表示设备A在上下文1中的概率矩阵。Predefine the context attributes of all types of devices, comprehensively analyze all context attributes, directly combine unrelated contexts, and reduce related contexts to their repeated parts and then combine them to form a neighborhood shared context that includes all contexts of all devices. The analysis results form a neighborhood-device context correspondence table, which can be expressed as a matrix, for example Each row represents the type of neighborhood shared context attributes, and the content in each row represents the characteristics of the neighborhood shared context corresponding to various device contexts, that is, S A1 represents the probability matrix of device A in context 1.
在一种实施方式中,步骤S5具体包括:In one embodiment, step S5 specifically includes:
根据邻域-设备上下文对应表,采用预设概率矩阵器算法计算待检测设备所处的邻域处于每个邻域共享上下文的可能性;According to the neighborhood-device context correspondence table, a preset probability matrix algorithm is used to calculate the possibility that the neighborhood where the device to be detected is located is in the shared context of each neighborhood;
并将可能性最大的上下文作为目标邻域共享上下文。And take the most likely context as the target neighborhood to share the context.
其中,待检测设备所处的邻域处于每个邻域共享上下文的可能性的计算方式,包括:Among them, the calculation method of the possibility that the neighborhood where the device to be detected is located is in each neighborhood shared context includes:
min{P(i)}min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…P(i)=a*D(M A ,S Ai )+b*D(M B ,S Bi )+c*D(M C ,S Ci )+…
其中,i表示邻域共享上下文序号,a、b、c表示该邻域设备A、设备B、设备C的数量,D表示计算两个矩阵之间欧式距离的函数,SAi表示邻域共享上下文属性i对应的设备A的概率矩阵,SBi邻域共享上下文属性i对应的设备B的概率矩阵,SCi表示邻域共享上下文属性i对应的设备C的概率矩阵,MA表示所有A类设备的概率矩阵的平均值,MB表示所有B类设备的概率矩阵的平均值,MC表示所有C类设备的概率矩阵的平均值。Among them, i represents the neighborhood sharing context number, a, b, c represent the number of devices A, B, and C in the neighborhood, D represents the function to calculate the Euclidean distance between two matrices, and S Ai represents the neighborhood sharing context The probability matrix of device A corresponding to attribute i, S Bi the probability matrix of device B corresponding to the neighborhood shared context attribute i, S Ci represents the probability matrix of device C corresponding to the neighborhood shared context attribute i, M A represents all A-type devices The average value of the probability matrix of , M B represents the average value of the probability matrix of all B-type devices, and M C represents the average value of the probability matrix of all C-type devices.
具体来说,在达到邻域上下文检测周期时,将该时刻之前一段时间内的所有数据使用概率矩阵器算法中提取特征矩阵相同的方式提取出测试数据的概率矩阵,然后将该邻域内的相同种类设备提取的概率矩阵求平均值,例如设备A的平均矩阵MA,设备B的平均矩阵MB。根据之前分析得到的邻域-设备上下文对应表,计算属于各个邻域共享上下文的可能性。例如邻域共享上下文1对应SA1、SB1、SC1的情形,则其可能性计算公式为:P(1)=a*D(MA,SA1)+b*D(MB,SB1)+c*D(MC,SC1),其中a、b、c表示该邻域设备A、设备B、设备C的数量,D是计算两个矩阵之间欧式距离的函数。对于邻域-设备上下文对应表中每一行都提取对应的设备上下文概率矩阵来计算P值,并比较其大小,值最大的则判定为是下一周期该邻域的上下文属性。Specifically, when the neighborhood context detection period is reached, the probability matrix of the test data is extracted from all the data within a period of time before the time using the same method as the feature matrix extracted by the probability matrix algorithm, and then the same The probability matrices extracted by the kind of devices are averaged, for example, the average matrix M A of device A, and the average matrix M B of device B. According to the neighborhood-device context correspondence table obtained by the previous analysis, the possibility of belonging to the shared context of each neighborhood is calculated. For example, if the neighborhood sharing context 1 corresponds to S A1 , S B1 , and S C1 , the possibility calculation formula is: P(1)=a*D(M A ,S A1 )+b*D(M B ,S B1 )+c*D( MC , S C1 ) , where a, b, and c represent the number of devices A, B, and C in the neighborhood, and D is a function to calculate the Euclidean distance between two matrices. For each row in the neighborhood-device context correspondence table, the corresponding device context probability matrix is extracted to calculate the P value, and its size is compared. The largest value is determined as the context attribute of the neighborhood in the next cycle.
在一种实施方式中,步骤S7具体包括:In one embodiment, step S7 specifically includes:
将待检测设备的数据转化为字符序列后,定义相等大小n的滑动窗口W,计算滑动窗口中的字符序列出现的概率;After the data of the device to be detected is converted into a character sequence, a sliding window W of equal size n is defined, and the probability of occurrence of the character sequence in the sliding window is calculated;
将计算的概率值与设定的阈值p进行比较,若小于阈值则标记为非正常,继续滑动窗口,如果连续出现了k个非正常的时刻,则检测出该时刻数据为异常。The calculated probability value is compared with the set threshold p. If it is smaller than the threshold value, it is marked as abnormal, and the window continues to slide. If k abnormal moments appear continuously, the data at this moment is detected as abnormal.
具体来说,根据本发明的检测方法,物联网数据行为异常检测过程首先需要确定该设备当前数据是在哪一个上下文环境中产生的。在确定上下文后,从提取的所有高绿矩阵中选择该设备与该上下文对应的概率矩阵S。将待检测数据转化为字符序列后,定义相等大小n的滑动窗口W,计算滑动窗口中的序列的出现概率,计算方式为读取概率矩阵中对应的概率然后相乘,例如:定义S(AB,n-1)表示在n-1距离下AB字符对出现的概率,则对于大小为4的滑动窗口,其中序列为ABAC的情况,概率计算公式为:S(AB,1)*S(BA,1)*S(AC,1)*S(AA,2)*S(BC,2)*S(AC,3)。将得出的概率值与设定的阈值p进行比较,若小于阈值则标记为非正常,继续滑动窗口,如果连续出现了k个非正常的时刻,则检测出该时刻数据是异常。其中p和k的值需要预先根据训练数据进行设定,其值与滑动窗口的大小n和设备自身特点有密切关系。Specifically, according to the detection method of the present invention, in the process of detecting abnormal behavior of Internet of Things data, it is first necessary to determine in which context the current data of the device is generated. After the context is determined, the probability matrix S corresponding to the context of the device is selected from all the extracted high-green matrices. After converting the data to be detected into character sequences, define a sliding window W of equal size n, and calculate the occurrence probability of the sequence in the sliding window. The calculation method is to read the corresponding probability in the probability matrix and then multiply, for example: define S(AB ,n-1) represents the probability of the AB character pair appearing at the distance of n-1, then for a sliding window of size 4, where the sequence is ABAC, the probability calculation formula is: S(AB,1)*S(BA ,1)*S(AC,1)*S(AA,2)*S(BC,2)*S(AC,3). The obtained probability value is compared with the set threshold p. If it is less than the threshold value, it is marked as abnormal, and the window continues to slide. If k abnormal moments appear in a row, the data at this moment is detected as abnormal. The values of p and k need to be set in advance according to the training data, and their values are closely related to the size n of the sliding window and the characteristics of the device itself.
从本发明提供的方法可以看出,采用改进的概率矩阵器算法,将其求得的概率矩阵不仅用于计算数据行为的正常概率,还用作数据本身的特征以判定上下文属性、以及异常数据对检测。It can be seen from the method provided by the present invention that the improved probability matrix algorithm is adopted, and the obtained probability matrix is not only used to calculate the normal probability of data behavior, but also used as the feature of the data itself to determine context attributes and abnormal data. pair detection.
在一种实施方式中,在进行异常检测时,每隔预设周期执行确定目标邻域共享上下文的步骤。In one embodiment, when performing anomaly detection, the step of determining the shared context of the target neighborhood is performed every preset period.
通过利用上下文环境改变速度远低于异常检测频率,不需要每次检测时都判定一遍邻域共享上下文,而是取一个合适的周期来判定,可以降低检测的计算量。By using the context environment changing speed much lower than the abnormal detection frequency, it is not necessary to determine the neighborhood shared context every time it is detected, but to take an appropriate period to determine, which can reduce the amount of detection calculation.
为了更清楚地说明本发明提供的方法的实现过程,下面通过一个具体示例予以详细介绍,请参见图3。In order to more clearly illustrate the implementation process of the method provided by the present invention, a specific example is used for detailed description below, please refer to FIG. 3 .
首先将所有设备划分领域,然后对不同邻域中的设备分开进行检测,对于某一邻域,判断该邻域得共享上下文属性,根据对应表以及判断出的共享上下文属性,加载对应得概率矩阵,然后采用概率矩阵器算法(即预设概率矩阵器算法)来检测异常,并判断是否检测出异常,如果是,则报警,如果否,则判断数据检测是否结束,如果结束,则结束检测过程,如果否,则进一步判断是否达到上下文检测周期,如果达到,则继续回到判断共享上下文属性的步骤,否则继续进行数据检测。First, all devices are divided into fields, and then the devices in different neighborhoods are detected separately. For a neighborhood, it is judged that the neighborhood has shared context attributes, and the corresponding probability matrix is loaded according to the corresponding table and the judged shared context attributes. , and then use the probability matrix algorithm (that is, the preset probability matrix algorithm) to detect the abnormality, and judge whether the abnormality is detected, if it is, then alarm, if not, then judge whether the data detection is over, if it is over, then end the detection process , if not, further judge whether the context detection period is reached, if so, continue to return to the step of judging shared context attributes, otherwise continue to perform data detection.
基于同一发明构思,本申请还提供了一种与实施例一中物联网大数据环境下的数据异常检测方法对应的装置,详见实施例二。Based on the same inventive concept, the present application also provides a device corresponding to the data anomaly detection method in the Internet of Things big data environment in the first embodiment. For details, refer to the second embodiment.
实施例二Embodiment 2
本实施例提供了一种物联网大数据环境下的数据异常检测装置,请参见图4,该装置包括:This embodiment provides a data anomaly detection device in the big data environment of the Internet of Things, please refer to FIG. 4 , the device includes:
上下文属性定义模块201,用于将需检测的物联网设备进行分类,为每类设备定义上下文属性,其中,每一个上下文属性对应该类设备的一种行为模式;The context
设备特征提取模块202,用于采集每类设备在每个上下文属性中运行时产生的物联网数据,利用预设概率矩阵器算法计算与每种设备对应的上下文属性的概率矩阵,用以提取每类设备的特征;The device
邻域共享上下文获得模块203,用于根据所有种类设备的所有上下文属性,得到所有设备处于邻域范围内所共享的上下文,将其作为邻域共享上下文;The neighborhood sharing
对应表形成模块204,用于根据所有设备处于邻域范围内所共享的上下文,形成邻域-设备上下文对应表,其中,所述对应表中包括概率矩阵;A correspondence
目标邻域共享上下文确定模块205,用于计算待检测设备所处的邻域处于每个邻域共享上下文的可能性,并基于计算出的可能性情况,确定与该邻域对应的目标邻域共享上下文;The target neighborhood shared
概率矩阵加载模块206,用于根据确定出的目标邻域共享上下文,从邻域-设备上下文对应表中,加载与待检测设备相应的概率矩阵;The probability
异常检测模块207,用于基于加载的概率矩阵采用预设概率矩阵器算法对每个待检测设备的数据进行异常检测。The
在一种实施方式中,设备特征提取模块202具体用于:In one embodiment, the device
将不同种类设备的不同上下文分开独立进行训练;Separate and independently train different contexts of different types of devices;
通过数据采集器采集所有情况的正常运行数据,根据其值域划分为不多于10个的分段,其中,代表数据所属分段的符号用以表示一个数据;The normal operation data of all situations is collected by the data collector, and divided into no more than 10 segments according to its value range, wherein the symbol representing the segment to which the data belongs is used to represent a data;
将采集的数据形成以时间为维度的序列,并转化为符号序列,定义一个固定大小为n的滑动窗口W,使其按时间流动方向移动;The collected data is formed into a sequence with time as the dimension, and converted into a sequence of symbols, and a sliding window W with a fixed size n is defined to make it move in the direction of time flow;
每一时刻,滑动窗口中存在一个长度为n的字符序列,统计其中距离为1到n-1的双字符对的个数,并创建一个以双字符对的种类行为,以相邻距离1到n-1为列的特征矩阵,在特征矩阵记录滑动窗口在移动过程中统计的个数;At each moment, there is a character sequence of length n in the sliding window, count the number of double-character pairs with distances from 1 to n-1, and create a kind of double-character behavior with adjacent distances from 1 to 1. n-1 is the feature matrix of the column, and the number of statistics of the sliding window during the moving process is recorded in the feature matrix;
对于上述计算方式得到的矩阵按列进行归一化,得到此距离下每个字符对出现的概率,将其作为对应的概率矩阵。The matrix obtained by the above calculation method is normalized by column, and the probability of each character pair appearing under this distance is obtained, which is used as the corresponding probability matrix.
在一种实施方式中,邻域共享上下文获得模块203具体用于:In one embodiment, the neighborhood sharing
综合分析所有的上下文属性,将无关联的上下文直接联合,将有关联的上下文约简重复部分后再联合,形成包含所有设备所有上下文的邻域共享上下文。Comprehensively analyze all the context attributes, directly combine the unrelated contexts, reduce the repeated parts of the related contexts and then combine them to form a neighborhood shared context that includes all the contexts of all devices.
在一种实施方式中,目标邻域共享上下文确定模块205具体用于:In one embodiment, the target neighborhood shared
根据邻域-设备上下文对应表,采用预设概率矩阵器算法计算待检测设备所处的邻域处于每个邻域共享上下文的可能性;According to the neighborhood-device context correspondence table, a preset probability matrix algorithm is used to calculate the possibility that the neighborhood where the device to be detected is located is in the shared context of each neighborhood;
并将可能性最大的上下文作为目标邻域共享上下文。And take the most likely context as the target neighborhood to share the context.
在一种实施方式中,目标邻域共享上下文确定模块205中可能性计算具体包括:In one embodiment, the possibility calculation in the target neighborhood shared
min{P(i)}min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…P(i)=a*D(M A ,S Ai )+b*D(M B ,S Bi )+c*D(M C ,S Ci )+…
其中,i表示邻域共享上下文序号,a、b、c表示该邻域设备A、设备B、设备C的数量,D表示计算两个矩阵之间欧式距离的函数,SAi表示邻域共享上下文属性i对应的设备A的概率矩阵,SBi邻域共享上下文属性i对应的设备B的概率矩阵,SCi表示邻域共享上下文属性i对应的设备C的概率矩阵,MA表示所有A类设备的概率矩阵的平均值,MB表示所有B类设备的概率矩阵的平均值,MC表示所有C类设备的概率矩阵的平均值。Among them, i represents the neighborhood sharing context number, a, b, c represent the number of devices A, B, and C in the neighborhood, D represents the function to calculate the Euclidean distance between two matrices, and S Ai represents the neighborhood sharing context The probability matrix of device A corresponding to attribute i, S Bi the probability matrix of device B corresponding to the neighborhood shared context attribute i, S Ci represents the probability matrix of device C corresponding to the neighborhood shared context attribute i, M A represents all A-type devices The average value of the probability matrix of , M B represents the average value of the probability matrix of all B-type devices, and M C represents the average value of the probability matrix of all C-type devices.
在一种实施方式中,异常检测模块207具体用于:In one embodiment, the
将待检测设备的数据转化为字符序列后,定义相等大小n的滑动窗口W,计算滑动窗口中的字符序列出现的概率;After the data of the device to be detected is converted into a character sequence, a sliding window W of equal size n is defined, and the probability of occurrence of the character sequence in the sliding window is calculated;
将计算的概率值与设定的阈值p进行比较,若小于阈值则标记为非正常,继续滑动窗口,如果连续出现了k个非正常的时刻,则检测出该时刻数据为异常。The calculated probability value is compared with the set threshold p. If it is smaller than the threshold value, it is marked as abnormal, and the window continues to slide. If k abnormal moments appear continuously, the data at this moment is detected as abnormal.
在一种实施方式中,本实施例提供的装置还包括周期检测模块,用于在进行异常检测时,每隔预设周期执行确定目标邻域共享上下文的步骤。In an implementation manner, the apparatus provided in this embodiment further includes a period detection module, configured to perform the step of determining the shared context of the target neighborhood every preset period during abnormality detection.
为了更清楚地说明本发明提供的装置得架构,下面通过一个具体示例予以详细介绍,请参见图5。In order to more clearly illustrate the structure of the device provided by the present invention, a specific example is used for detailed description below, please refer to FIG. 5 .
图5中,通过数据采集从设备群中采集训练数据用于模型的训练,在实际检测过程中,各个邻域内的设备会向异常检测中心发送物联网实时数据,用于后续的异常检测。In Figure 5, training data is collected from the device group for model training through data collection. During the actual detection process, the devices in each neighborhood will send real-time IoT data to the anomaly detection center for subsequent anomaly detection.
异常检测中心相当于本实施例中的检测装置,数据行为训练模块相当于对应表形成模块204,用于构建检测模型,邻域上下文判定模块相当于目标邻域共享上下文确定模块205,数据异常检测模块相当于异常检测模块207。The anomaly detection center is equivalent to the detection device in this embodiment, the data behavior training module is equivalent to the correspondence
由于本发明实施例二所介绍的装置,为实施本发明实施例一中物联网大数据环境下的数据异常检测方法所采用的装置,故而基于本发明实施例一所介绍的方法,本领域所属人员能够了解该装置的具体结构及变形,故而在此不再赘述。凡是本发明实施例一的方法所采用的装置都属于本发明所欲保护的范围。Since the apparatus introduced in the second embodiment of the present invention is an apparatus used to implement the data anomaly detection method in the IoT big data environment in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, it belongs to the field of Personnel can understand the specific structure and deformation of the device, so it is not repeated here. All devices used in the method of Embodiment 1 of the present invention belong to the scope of protection of the present invention.
实施例三Embodiment 3
基于同一发明构思,本申请还提供了一种计算机可读存储介质300,请参见图6,其上存储有计算机程序311,该程序被执行时实现实施例一中的方法。Based on the same inventive concept, the present application also provides a computer-readable storage medium 300, see FIG. 6, on which a
由于本发明实施例三所介绍的计算机可读存储介质,为实施本发明实施例一中物联网大数据环境下的数据异常检测方法所采用的计算机可读存储介质,故而基于本发明实施例一所介绍的方法,本领域所属人员能够了解该计算机可读存储介质的具体结构及变形,故而在此不再赘述。凡是本发明实施例一的方法所采用的计算机可读存储介质都属于本发明所欲保护的范围。Since the computer-readable storage medium introduced in the third embodiment of the present invention is the computer-readable storage medium used to implement the data anomaly detection method in the IoT big data environment in the first embodiment of the present invention, it is based on the first embodiment of the present invention. For the introduced method, those skilled in the art can understand the specific structure and modification of the computer-readable storage medium, so it is not repeated here. Any computer-readable storage medium used in the method in Embodiment 1 of the present invention falls within the scope of protection of the present invention.
实施例四Embodiment 4
基于同一发明构思,本申请还提供了一种计算机设备,请参见图7,包括存储401、处理器402及存储在存储器上并可在处理器上运行的计算机程序403,处理器402执行上述程序时实现实施例一中的方法。Based on the same inventive concept, the present application also provides a computer device, see FIG. 7 , including a
由于本发明实施例四所介绍的计算机设备为实施本发明实施例一中物联网大数据环境下的数据异常检测方法所采用的计算机设备,故而基于本发明实施例一所介绍的方法,本领域所属人员能够了解该计算机设备的具体结构及变形,故而在此不再赘述。凡是本发明实施例一中方法所采用的计算机设备都属于本发明所欲保护的范围。Since the computer equipment introduced in the fourth embodiment of the present invention is the computer equipment used to implement the data anomaly detection method in the IoT big data environment in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, the field of Those who belong to it can understand the specific structure and deformation of the computer equipment, so it is not repeated here. All computer equipment used in the method in Embodiment 1 of the present invention belongs to the scope of protection of the present invention.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the spirit and scope of the embodiments of the present invention. Thus, provided that these modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910318526.7A CN110086860B (en) | 2019-04-19 | 2019-04-19 | Data anomaly detection method and device under Internet of things big data environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910318526.7A CN110086860B (en) | 2019-04-19 | 2019-04-19 | Data anomaly detection method and device under Internet of things big data environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110086860A CN110086860A (en) | 2019-08-02 |
CN110086860B true CN110086860B (en) | 2020-09-08 |
Family
ID=67415653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910318526.7A Active CN110086860B (en) | 2019-04-19 | 2019-04-19 | Data anomaly detection method and device under Internet of things big data environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110086860B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942256B (en) * | 2019-12-02 | 2020-12-04 | 清华四川能源互联网研究院 | Method and system for abnormal detection of real-time interactive process of new energy power plant station |
CN112597255B (en) * | 2020-12-16 | 2024-12-31 | 北京三快在线科技有限公司 | A method and device for determining abnormal data |
CN114996318B (en) * | 2022-07-12 | 2022-11-04 | 成都唐源电气股份有限公司 | Automatic judgment method and system for processing mode of abnormal value of detection data |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831442A (en) * | 2011-06-13 | 2012-12-19 | 索尼公司 | Abnormal behavior detection method and equipment and method and equipment for generating abnormal behavior detection equipment |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10298612B2 (en) * | 2015-06-29 | 2019-05-21 | Argus Cyber Security Ltd. | System and method for time based anomaly detection in an in-vehicle communication network |
US10530795B2 (en) * | 2017-03-17 | 2020-01-07 | Target Brands, Inc. | Word embeddings for anomaly classification from event logs |
CN108108253A (en) * | 2017-12-26 | 2018-06-01 | 北京航空航天大学 | A kind of abnormal state detection method towards multiple data stream |
CN108668303B (en) * | 2018-05-15 | 2021-08-10 | 上海兆祥邮轮科技集团股份有限公司 | Incremental outlier detection method for wireless sensor network data stream |
-
2019
- 2019-04-19 CN CN201910318526.7A patent/CN110086860B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831442A (en) * | 2011-06-13 | 2012-12-19 | 索尼公司 | Abnormal behavior detection method and equipment and method and equipment for generating abnormal behavior detection equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110086860A (en) | 2019-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110896381B (en) | A kind of traffic classification method, system and electronic device based on deep neural network | |
CN112003870B (en) | Network encryption traffic identification method and device based on deep learning | |
CN110572362B (en) | Network attack detection method and device for multiple types of unbalanced abnormal traffic | |
JP6137938B2 (en) | Method for detecting anomalies in time series data | |
CN110086860B (en) | Data anomaly detection method and device under Internet of things big data environment | |
CN102420723A (en) | Anomaly detection method for multi-class intrusion | |
CN107493277A (en) | The online method for detecting abnormality of big data platform based on maximum information coefficient | |
CN113569756B (en) | Abnormal behavior detection and location method, system, terminal equipment and readable storage medium | |
CN114520736B (en) | Internet of things security detection method, device, equipment and storage medium | |
WO2020056968A1 (en) | Data denoising method and apparatus, computer device, and storage medium | |
CN117113262A (en) | Network traffic identification method and system | |
CN109726737B (en) | Track-based abnormal behavior detection method and device | |
CN108830882B (en) | Video abnormal behavior real-time detection method | |
CN110097120B (en) | Network flow data classification method, equipment and computer storage medium | |
CN110995713A (en) | Botnet detection system and method based on convolutional neural network | |
CN113645215A (en) | Method, device, equipment and storage medium for detecting abnormal network traffic data | |
CN105913008A (en) | Crowd exceptional event detection method based on hypothesis examination | |
CN111901324B (en) | A method, device and storage medium for flow recognition based on sequence entropy | |
CN115967972A (en) | Network anomaly detection method and device, electronic equipment and storage medium | |
CN118690365A (en) | Attack detection method and device | |
CN110443244B (en) | Graphics processing method and related device | |
CN118018260A (en) | Network attack detection method, system, equipment and medium | |
CN117132769A (en) | Insulator fault diagnosis method and device | |
CN112884069A (en) | Method for detecting confrontation network sample | |
CN111159961A (en) | Abnormity detection method and system based on curve data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |