CN110086860B - Data anomaly detection method and device under Internet of things big data environment - Google Patents

Data anomaly detection method and device under Internet of things big data environment Download PDF

Info

Publication number
CN110086860B
CN110086860B CN201910318526.7A CN201910318526A CN110086860B CN 110086860 B CN110086860 B CN 110086860B CN 201910318526 A CN201910318526 A CN 201910318526A CN 110086860 B CN110086860 B CN 110086860B
Authority
CN
China
Prior art keywords
context
neighborhood
equipment
probability matrix
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910318526.7A
Other languages
Chinese (zh)
Other versions
CN110086860A (en
Inventor
赵波
李想
黎佳玥
朱晓南
刘一凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910318526.7A priority Critical patent/CN110086860B/en
Publication of CN110086860A publication Critical patent/CN110086860A/en
Application granted granted Critical
Publication of CN110086860B publication Critical patent/CN110086860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • General Engineering & Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a data anomaly detection method and device in an Internet of things big data environment. Then, all the context attributes are reduced and combined to form a corresponding table of the neighborhood sharing context and the equipment context, and a corresponding probability matrix is filled. And comprehensively analyzing all the devices in the same neighborhood in the detection process, calculating the possibility of sharing the context of each neighborhood according to the corresponding table, obtaining a final judgment result, and finally loading the probability matrix of each device according to the judged context and carrying out anomaly detection by using a probability detector algorithm. The method can detect the abnormal events lasting for a period of time, can adapt to the Internet of things equipment with various behavior modes, can solve the problem that the existing context judgment process is not credible by introducing neighborhood attributes, and improves the detection accuracy.

Description

Data anomaly detection method and device under Internet of things big data environment
Technical Field
The invention relates to the technical field of information security, in particular to a data anomaly detection method and device in an Internet of things big data environment.
Background
The internet of things is a system for realizing person interconnection and object interconnection, and along with the rapid development of network science and technology, the internet of things system is applied to various infrastructures, so that wide services are provided for the society. In recent years, a large number of novel internet-of-things big data analysis platforms which take the internet of things as a data source, big data as an analysis object and artificial intelligence as a technical means are emerged. Therefore, the source data generated by the internet of things is required to have higher credibility, otherwise, the accuracy of the analysis result of the subsequent big data is influenced, and serious consequences are caused, so that the abnormal detection of the big data of the internet of things is required, and the quality of the data is improved.
In the prior art, the common internet of things big data anomaly detection technology mainly has three modes: a Markov anomaly detector, a sliding window based anomaly detection technique, a context-aware anomaly detection technique.
The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:
the Markov anomaly detector detects anomalies of the device based on a Markov state transition matrix, which considers that the current state is only related to the previous state and not to any state before. The Markov anomaly detector training model is simple in process, but the detection process is high in calculation complexity, only abnormal sudden changes at a certain moment can be concerned by the Markov anomaly detector, and the data of the Internet of things always have problems due to abnormal events lasting for a period of time, so that the detection accuracy is low.
After introducing the concept of sliding window, a large number of researchers have proposed some sliding window-based anomaly detection methods for different devices. The sliding window can pay attention to the behavior information of the Internet of things data in a period of time, can effectively deal with the abnormal condition of the data lasting for a period of time, and improves the detection accuracy to a certain extent. However, the method can only be used for simpler embedded equipment, the functions of the internet of things equipment are more and more complex, and the method has multiple behavior modes, and the accuracy of the traditional anomaly detection method based on the sliding window can be greatly reduced when the internet of things equipment is detected, so that the method cannot cope with the scenes of the internet of things with more and more powerful functions.
In order to detect the internet of things equipment with multiple behavior modes, some methods improve the sliding window anomaly detection technology, a context detection module is added, the context attribute of the internet of things equipment, namely the physical environment of the equipment, is detected, and then different anomaly detection models are loaded according to different context attributes for detection. Although the context-aware anomaly detection technology can detect anomalies of data of the internet of things with context attributes to a certain extent, the current method aims at detection of single equipment, and judges the context attributes by using the data of the internet of things, if the data is not authentic, the whole detection process is not authentic, and the accuracy of detection results is low.
Therefore, the method in the prior art has the technical problem of low accuracy.
Disclosure of Invention
In view of the above, the invention provides a data anomaly detection method and device in an internet of things big data environment, which are used for solving or at least partially solving the technical problem of low accuracy of the method in the prior art.
The invention provides a data anomaly detection method under an Internet of things big data environment, which comprises the following steps:
step S1: classifying the Internet of things equipment to be detected, and defining context attributes for each type of equipment, wherein each context attribute corresponds to one behavior mode of the type of equipment;
step S2: acquiring Internet of things data generated when each type of equipment operates in each context attribute, and calculating a probability matrix of the context attribute corresponding to each type of equipment by using a preset probability matrix algorithm so as to extract the characteristics of each type of equipment;
step S3: obtaining a shared context of all the devices in a neighborhood range according to all the context attributes of all the types of devices, and taking the shared context as a neighborhood shared context;
step S4: forming a neighborhood-equipment context corresponding table according to the shared context of all the equipment in the neighborhood range, wherein the corresponding table comprises a probability matrix;
step S5: calculating the possibility that the neighborhood where the equipment to be detected is located in each neighborhood sharing context, and determining a target neighborhood sharing context corresponding to the neighborhood based on the calculated possibility;
step S6: according to the determined target neighborhood sharing context, loading a probability matrix corresponding to the equipment to be detected from a neighborhood-equipment context corresponding table;
step S7: and carrying out anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm based on the loaded probability matrix.
In one embodiment, step S2 specifically includes:
training different contexts of different kinds of equipment separately and independently;
collecting normal operation data of all conditions through a data collector, and dividing the normal operation data into no more than 10 segments according to the value range of the normal operation data, wherein a symbol representing the segment to which the data belongs is used for representing one data;
forming a sequence taking time as a dimension on the acquired data, converting the sequence into a symbol sequence, and defining a sliding window W with a fixed size of n to move along the time flow direction;
at each moment, a character sequence with the length of n exists in the sliding window, the number of double character pairs with the distance of 1 to n-1 is counted, a characteristic matrix with the type of the double character pairs as a column and the adjacent distance of 1 to n-1 is created, and the counted number of the sliding window in the moving process is recorded in the characteristic matrix;
and normalizing the matrix obtained by the calculation mode according to columns to obtain the probability of each character pair at the distance, and taking the probability as a corresponding probability matrix.
In one embodiment, step S3 specifically includes:
comprehensively analyzing all context attributes, directly combining the unrelated contexts, reducing the repeated parts of the related contexts and then combining the related contexts to form a neighborhood sharing context containing all the contexts of all the equipment.
In one embodiment, step S5 specifically includes:
calculating the possibility that the neighborhood where the equipment to be detected is located is in each neighborhood sharing context by adopting a preset probability matrix algorithm according to the neighborhood-equipment context corresponding table;
and the context with the highest probability is taken as the target neighborhood sharing context.
In one embodiment, a calculation method of the possibility that the neighborhood where the device to be detected is located shares the context with each neighborhood includes:
min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…
wherein i represents the serial number of the shared context in the neighborhood, a, B and C represent the number of the equipment A, the equipment B and the equipment C in the neighborhood, D represents a function for calculating the Euclidean distance between two matrixes, and SAiProbability matrix, S, representing devices A corresponding to the neighborhood sharing context attribute iBiProbability matrix of device B corresponding to neighborhood sharing context attribute i, SCiProbability matrix, M, representing devices C corresponding to the neighborhood sharing context attribute iAMean value of probability matrix representing all class A devices, MBMean value of probability matrix representing all B-class devices, MCRepresents the average of the probability matrices for all class C devices.
In one embodiment, step S7 specifically includes:
after converting the data of the equipment to be detected into a character sequence, defining sliding windows W with equal size n, and calculating the probability of the character sequence in the sliding windows;
and comparing the calculated probability value with a set threshold value p, marking the abnormal time if the probability value is less than the threshold value, continuing sliding the window, and detecting that the data at the time is abnormal if k abnormal times continuously appear.
In one embodiment, the step of determining the target neighborhood sharing context is performed every preset period when the anomaly detection is performed.
Based on the same inventive concept, a second aspect of the present invention provides a data anomaly detection apparatus in an internet of things big data environment, including:
the context attribute definition module is used for classifying the Internet of things equipment to be detected and defining context attributes for each type of equipment, wherein each context attribute corresponds to one behavior mode of the type of equipment;
the device feature extraction module is used for acquiring Internet of things data generated when each type of device operates in each context attribute, calculating a probability matrix of the context attribute corresponding to each type of device by using a preset probability matrix algorithm, and extracting features of each type of device;
the neighborhood sharing context obtaining module is used for obtaining the context shared by all the devices in the neighborhood range according to all the context attributes of all the types of devices, and taking the context as the neighborhood sharing context;
a correspondence table forming module, configured to form a neighborhood-device context correspondence table according to a context shared by all devices in a neighborhood range, where the correspondence table includes a probability matrix;
the target neighborhood sharing context determining module is used for calculating the possibility that the neighborhood where the equipment to be detected is located in each neighborhood sharing context, and determining the target neighborhood sharing context corresponding to the neighborhood based on the calculated possibility;
the probability matrix loading module is used for loading a probability matrix corresponding to the equipment to be detected from the neighborhood-equipment context corresponding table according to the determined target neighborhood sharing context;
and the anomaly detection module is used for carrying out anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm based on the loaded probability matrix.
Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, performs the method of the first aspect.
Based on the same inventive concept, a fourth aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a data anomaly detection method under an Internet of things big data environment, which comprises the steps of classifying Internet of things equipment to be detected, and defining context attributes for each type of equipment; collecting Internet of things data generated when each type of equipment operates in each context attribute, and calculating a probability matrix of the context attribute corresponding to each type of equipment by using a preset probability matrix algorithm; then obtaining the context shared by all the devices in the neighborhood range according to all the context attributes of all the types of devices; then, according to the context shared by all the devices in the neighborhood range, a neighborhood-device context corresponding table is formed; next, calculating the possibility that the neighborhood where the equipment to be detected is located in each neighborhood sharing context, determining a target neighborhood sharing context corresponding to the neighborhood based on the calculated possibility, and loading a probability matrix corresponding to the equipment to be detected from a neighborhood-equipment context corresponding table according to the determined target neighborhood sharing context; and finally, based on the loaded probability matrix, performing anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm.
Compared with the existing method, the method for judging the credibility of the Internet of things data based on the multiple behavior patterns has the advantages that in the Internet of things data behavior abnormity detection process, the current data of the equipment is firstly determined in which context environment, after the context is determined, the probability matrix corresponding to the equipment and the context is selected from the pre-constructed neighborhood-equipment context corresponding table, namely, the method for judging the credibility based on the neighborhood sharing context and the data behavior can judge the context attribute in the neighborhood range of the multiple types of equipment, the data generated by the Internet of things equipment with the multiple behavior patterns can be detected, and the method has the characteristics of high abnormity identification rate, high detection accuracy rate and low calculation complexity in the detection process. The method solves the technical problem of low accuracy in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a data anomaly detection method in an Internet of things big data environment according to the invention;
FIG. 2 is a flow chart of the construction of a context-aware anomaly detection model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a flow chart of Internet of things big data anomaly detection based on neighborhood sharing context attributes in a specific example;
fig. 4 is a block diagram of a data anomaly detection device in an internet of things big data environment according to an embodiment of the present invention;
FIG. 5 is a frame diagram of a big data anomaly detection system of the Internet of things according to an embodiment of the invention;
FIG. 6 is a block diagram of a computer-readable storage medium in an embodiment of the invention;
fig. 7 is a block diagram of a computer device in an embodiment of the present invention.
Detailed Description
The invention aims to provide a data anomaly detection method and device in an Internet of things big data environment, and aims to solve the problems that an Internet of things sensing layer is easy to attack and break down easily under an Internet of things big data architecture, and a big data analysis center analyzes error data and gives out error results, so that serious consequences can be caused.
In order to achieve the purpose, the invention provides a neighborhood sharing context attribute-based method for detecting data abnormality of equipment of the internet of things, which can judge the context attribute in the neighborhood range where multiple types of equipment exist, can detect data generated by the equipment of the internet of things with multiple behavior patterns, and has the characteristics of high abnormality recognition rate, high detection accuracy rate and low detection process calculation complexity.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment provides a data anomaly detection method in an internet of things big data environment, please refer to fig. 1, and the method includes:
step S1: classifying the Internet of things equipment to be detected, and defining context attributes for each type of equipment, wherein each context attribute corresponds to one behavior mode of the type of equipment.
Specifically, since each type of internet of things device needs to be operated independently, the internet of things devices to be detected are classified. And define context attributes for each class of device, e.g., a device of a certain class behaves in multiple ways and has multiple context attributes.
Step S2: and collecting the data of the Internet of things generated when each type of equipment operates in each context attribute, and calculating the probability matrix of the context attribute corresponding to each type of equipment by using a preset probability matrix algorithm so as to extract the characteristics of each type of equipment.
Specifically, in the normal operation process of the device, the data of the internet of things generated in operation in each context attribute can be collected through the data collector. The preset probability matrix algorithm is a probability detector algorithm based on a sliding window, and is a probability matrix of context attributes corresponding to each type of equipment, that is, a probability matrix of each type of equipment under each condition (each condition corresponds to one behavior, that is, corresponds to one context attribute). The probability matrix is a feature matrix and is used for extracting features of each type of equipment.
Step S3: and obtaining the shared context of all the devices in the neighborhood range according to all the context attributes of all the types of devices, and taking the shared context as the neighborhood shared context.
Specifically, all the context attributes of all the devices can be combined through reduction, so that the context attribute shared by all the devices in the neighborhood range is obtained, and the context attribute is called neighborhood sharing context.
Step S4: and forming a neighborhood-equipment context corresponding table according to the shared context of all the equipment in the neighborhood range, wherein the corresponding table comprises a probability matrix.
Specifically, the rows of the neighborhood-device context correspondence table represent each neighborhood shared context, the columns represent each class of devices, and the contents of the correspondence table are probability matrices. That is, for each type of neighborhood context, corresponding to which context of each type of device, the correspondence table stores the probability matrix for that context of that device.
Steps S1 to S4 of the present invention are a process for constructing a context-aware anomaly detection model, and specifically refer to fig. 2, which shows a specific implementation flow, specifically including: classifying equipment types, defining context attributes, calculating a probability matrix according to the context of the equipment, forming neighborhood sharing context attributes, and forming a neighborhood sharing context-equipment behavior probability matrix corresponding table (a neighborhood-equipment context corresponding table).
Step S5: calculating the possibility that the neighborhood where the equipment to be detected is located in each neighborhood sharing context, and determining the target neighborhood sharing context corresponding to the neighborhood based on the calculated possibility.
Specifically, in the actual detection process, all devices continuously send data packets to the anomaly detection center, and devices in different neighborhoods are separately detected.
In a specific implementation process, a data packet sent by the internet of things device to the anomaly detection center should include three contents: device type, neighborhood of the device, and internet of things Data, i.e., DataPackage (DeviceType, Area, Data). DeviceType represents the type of equipment, and the anomaly detection center needs to load different behavior characteristic models (probability matrixes) for detecting different equipment; area represents a neighborhood, and the same neighborhood shares the same context, so that the data of the same neighborhood needs to be comprehensively analyzed by an anomaly detection center, and different neighborhoods are mutually independent; the Data represents the Data to be detected, which is the focus of the invention, and the Data forming sequence generated in a short time is sent to the abnormality detection center to adapt to the detection method of the sliding window.
Specifically, the probability that the current situation belongs to each neighborhood sharing context may be calculated by using the preset probability matrix algorithm, so as to determine the target neighborhood sharing context.
Step S6: and loading a probability matrix corresponding to the equipment to be detected from the neighborhood-equipment context corresponding table according to the determined target neighborhood sharing context.
Specifically, the probability matrix loaded in this step is the probability matrix determined in step S5 and corresponding to the context attribute of each type of device in the current environment.
Step S7: and carrying out anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm based on the loaded probability matrix.
Specifically, anomaly detection is performed by adopting a corresponding probability matrix for the environment in which different devices are located. The method mainly contributes to the fact that the characteristic that neighborhood devices in the Internet of things system share the same context attribute is utilized, all devices in the same neighborhood can be comprehensively analyzed and judged to share the context attribute, the credibility of the context judging process is improved, and the accuracy is improved.
In one embodiment, step S2 specifically includes:
training different contexts of different kinds of equipment separately and independently;
collecting normal operation data of all conditions through a data collector, and dividing the normal operation data into no more than 10 segments according to the value range of the normal operation data, wherein a symbol representing the segment to which the data belongs is used for representing one data;
forming a sequence taking time as a dimension on the acquired data, converting the sequence into a symbol sequence, and defining a sliding window W with a fixed size of n to move along the time flow direction;
at each moment, a character sequence with the length of n exists in the sliding window, the number of double character pairs with the distance of 1 to n-1 is counted, a characteristic matrix with the type of the double character pairs as a column and the adjacent distance of 1 to n-1 is created, and the counted number of the sliding window in the moving process is recorded in the characteristic matrix;
and normalizing the matrix obtained by the calculation mode according to columns to obtain the probability of each character pair at the distance, and taking the probability as a corresponding probability matrix.
Specifically, at each time, a character sequence of length n exists in the sliding window, and the number of pairs of double characters in which the distance is 1 to n-1, i.e., W1W 2, W2W 3, …, W n-1W n, …, W1W 3, W2W 4, …, W n-1W n, is counted. The normalization process is to divide each record by the sum of columns to obtain the probability of each character pair at this distance, i.e. the probability matrix S (behavior characteristics) of such devices in the context, where S is used for subsequent context determination and anomaly detection.
In one embodiment, step S3 specifically includes:
comprehensively analyzing all context attributes, directly combining the unrelated contexts, reducing the repeated parts of the related contexts and then combining the related contexts to form a neighborhood sharing context containing all the contexts of all the equipment.
Specifically, the context attribute needs to be determined first on the basis of the anomaly detection algorithm. According to the characteristics of the work and the deployment of the Internet of things system, all the devices in the same neighborhood range share the same context attribute, the neighborhood sharing context attribute is judged according to the characteristics, and the specific implementation process is as follows:
the context attributes of all kinds of equipment are predefined, all the context attributes are comprehensively analyzed, unrelated contexts are directly combined, the related contexts are reduced to repeated parts and then combined, and a neighborhood sharing context containing all the contexts of all the equipment is formed. The analysis results form a neighborhood-device context correspondence table, which may be expressed, for example, as a matrix
Figure BDA0002033909200000091
Wherein each row represents a category of neighborhood sharing context attributes, and the content in each row represents characteristics of the neighborhood sharing context corresponding to various device contexts, i.e., SA1The probability matrix for device a in context 1 is represented.
In one embodiment, step S5 specifically includes:
calculating the possibility that the neighborhood where the equipment to be detected is located is in each neighborhood sharing context by adopting a preset probability matrix algorithm according to the neighborhood-equipment context corresponding table;
and the context with the highest probability is taken as the target neighborhood sharing context.
The calculation method of the possibility that the neighborhood where the equipment to be detected is located shares the context in each neighborhood comprises the following steps:
min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…
wherein i represents the serial number of the shared context in the neighborhood, a, B and C represent the number of the equipment A, the equipment B and the equipment C in the neighborhood, D represents a function for calculating the Euclidean distance between two matrixes, and SAiProbability matrix, S, representing devices A corresponding to the neighborhood sharing context attribute iBiProbability matrix of device B corresponding to neighborhood sharing context attribute i, SCiProbability matrix, M, representing devices C corresponding to the neighborhood sharing context attribute iAMean value of probability matrix representing all class A devices, MBMean value of probability matrix representing all B-class devices, MCRepresents the average of the probability matrices for all class C devices.
Specifically, when a neighborhood context detection period is reached, a probability matrix of test data is extracted from all data in a period before the time in the same way of extracting feature matrices from a probability matrix algorithm, and then the probability matrices extracted from the same kind of equipment in the neighborhood are averaged, for example, an average matrix M of equipment aAAverage matrix M of device BB. And calculating the possibility of belonging to each neighborhood sharing context according to the neighborhood-equipment context corresponding table obtained by the previous analysis. E.g., neighborhood sharing context 1 corresponds to SA1、SB1、SC1Then the probability calculation formula is: p (1) ═ a × D (M)A,SA1)+b*D(MB,SB1)+c*D(MC,SC1) Wherein a, b, c represent the neighborhoodThe number of the device A, the device B and the device C, and the number of the device D is a function for calculating the Euclidean distance between the two matrixes. And extracting a corresponding device context probability matrix for each row in the neighborhood-device context corresponding table to calculate a P value, comparing the P value with the P value, and judging that the P value is the context attribute of the neighborhood in the next period if the P value is the maximum value.
In one embodiment, step S7 specifically includes:
after converting the data of the equipment to be detected into a character sequence, defining sliding windows W with equal size n, and calculating the probability of the character sequence in the sliding windows;
and comparing the calculated probability value with a set threshold value p, marking the abnormal time if the probability value is less than the threshold value, continuing sliding the window, and detecting that the data at the time is abnormal if k abnormal times continuously appear.
Specifically, according to the detection method of the present invention, the internet of things data behavior anomaly detection process first needs to determine in which context the current data of the device is generated. After determining the context, a probability matrix S corresponding to the context is selected for the device from all the extracted high green matrices. After converting data to be detected into a character sequence, defining sliding windows W with equal size n, and calculating the occurrence probability of the sequence in the sliding window, wherein the calculation method is to read the corresponding probability in the probability matrix and then multiply, for example: defining S (AB, n-1) to represent the probability of the occurrence of an AB character pair at a distance of n-1, then for a sliding window of size 4, where the sequence is ABAC, the probability calculation formula is: s (AB,1) × S (BA,1) × S (AC,1) × S (AA,2) × S (BC,2) × S (AC, 3). And comparing the obtained probability value with a set threshold value p, marking the probability value as abnormal if the probability value is smaller than the threshold value, continuing sliding the window, and detecting that the data at the moment is abnormal if k abnormal moments continuously appear. The values of p and k need to be set in advance according to training data, and the values of p and k have close relation with the size n of the sliding window and the characteristics of the device.
It can be seen from the method provided by the present invention that an improved probability matrix algorithm is adopted, and the probability matrix obtained by the improved probability matrix algorithm is not only used for calculating the normal probability of the data behavior, but also used as the characteristic of the data itself to determine the context attribute and detect the abnormal data pair.
In one embodiment, the step of determining the target neighborhood sharing context is performed every preset period when the anomaly detection is performed.
By utilizing the fact that the change speed of the context environment is far lower than the abnormal detection frequency, the neighborhood sharing context does not need to be judged once in each detection, and an appropriate period is taken for judgment, so that the detection calculation amount can be reduced.
In order to more clearly illustrate the implementation of the method provided by the present invention, a detailed description is provided below by way of a specific example, please refer to fig. 3.
Dividing all the devices into fields, detecting the devices in different neighborhoods separately, judging the shared context attribute of a certain neighborhood, loading the corresponding probability matrix according to the corresponding table and the judged shared context attribute, detecting the abnormity by adopting a probability matrix algorithm (namely a preset probability matrix algorithm), judging whether the abnormity is detected, if so, alarming, if not, judging whether the data detection is finished, if not, finishing the detection process, if not, further judging whether the context detection period is reached, if so, continuing returning to the step of judging the shared context attribute, otherwise, continuing to detect the data.
Based on the same inventive concept, the application also provides a device corresponding to the data anomaly detection method in the big data environment of the internet of things in the first embodiment, which is detailed in the second embodiment.
Example two
The embodiment provides a data anomaly detection device under big data environment of the internet of things, please refer to fig. 4, the device includes:
the context attribute definition module 201 is configured to classify the internet of things devices to be detected, and define context attributes for each type of device, where each context attribute corresponds to a behavior pattern of the type of device;
the device feature extraction module 202 is configured to collect internet of things data generated when each type of device operates in each context attribute, calculate a probability matrix of the context attribute corresponding to each type of device by using a preset probability matrix algorithm, and extract features of each type of device;
a neighborhood sharing context obtaining module 203, configured to obtain, according to all context attributes of all types of devices, a context shared by all devices in a neighborhood range, and use the context as a neighborhood sharing context;
a correspondence table forming module 204, configured to form a neighborhood-device context correspondence table according to a context shared by all devices in a neighborhood range, where the correspondence table includes a probability matrix;
a target neighborhood sharing context determining module 205, configured to calculate a possibility that a neighborhood where the device to be detected is located in each neighborhood sharing context, and determine a target neighborhood sharing context corresponding to the neighborhood based on a calculated possibility;
a probability matrix loading module 206, configured to load a probability matrix corresponding to the device to be detected from the neighborhood-device context correspondence table according to the determined target neighborhood shared context;
and the anomaly detection module 207 is used for performing anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm based on the loaded probability matrix.
In an embodiment, the device feature extraction module 202 is specifically configured to:
training different contexts of different kinds of equipment separately and independently;
collecting normal operation data of all conditions through a data collector, and dividing the normal operation data into no more than 10 segments according to the value range of the normal operation data, wherein a symbol representing the segment to which the data belongs is used for representing one data;
forming a sequence taking time as a dimension on the acquired data, converting the sequence into a symbol sequence, and defining a sliding window W with a fixed size of n to move along the time flow direction;
at each moment, a character sequence with the length of n exists in the sliding window, the number of double character pairs with the distance of 1 to n-1 is counted, a characteristic matrix with the type of the double character pairs as a column and the adjacent distance of 1 to n-1 is created, and the counted number of the sliding window in the moving process is recorded in the characteristic matrix;
and normalizing the matrix obtained by the calculation mode according to columns to obtain the probability of each character pair at the distance, and taking the probability as a corresponding probability matrix.
In an embodiment, the neighborhood sharing context obtaining module 203 is specifically configured to:
comprehensively analyzing all context attributes, directly combining the unrelated contexts, reducing the repeated parts of the related contexts and then combining the related contexts to form a neighborhood sharing context containing all the contexts of all the equipment.
In one embodiment, the target neighborhood sharing context determining module 205 is specifically configured to:
calculating the possibility that the neighborhood where the equipment to be detected is located is in each neighborhood sharing context by adopting a preset probability matrix algorithm according to the neighborhood-equipment context corresponding table;
and the context with the highest probability is taken as the target neighborhood sharing context.
In one embodiment, the probability calculation in the target neighborhood sharing context determining module 205 specifically includes:
min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…
wherein i represents the serial number of the shared context in the neighborhood, a, B and C represent the number of the equipment A, the equipment B and the equipment C in the neighborhood, D represents a function for calculating the Euclidean distance between two matrixes, and SAiProbability matrix, S, representing devices A corresponding to the neighborhood sharing context attribute iBiProbability matrix of device B corresponding to neighborhood sharing context attribute i, SCiProbability matrix, M, representing devices C corresponding to the neighborhood sharing context attribute iAMean value of probability matrix representing all class A devices, MBRepresenting probability matrices for all class B devicesAverage value, MCRepresents the average of the probability matrices for all class C devices.
In one embodiment, the anomaly detection module 207 is specifically configured to:
after converting the data of the equipment to be detected into a character sequence, defining sliding windows W with equal size n, and calculating the probability of the character sequence in the sliding windows;
and comparing the calculated probability value with a set threshold value p, marking the abnormal time if the probability value is less than the threshold value, continuing sliding the window, and detecting that the data at the time is abnormal if k abnormal times continuously appear.
In an implementation manner, the apparatus provided in this embodiment further includes a period detection module, configured to perform, every preset period, a step of determining the target neighborhood sharing context when performing anomaly detection.
To more clearly illustrate the architecture of the apparatus provided by the present invention, a detailed description is provided below with reference to fig. 5.
In fig. 5, training data is collected from the device group through data collection for model training, and during actual detection, devices in each neighborhood send real-time data of the internet of things to the anomaly detection center for subsequent anomaly detection.
The anomaly detection center is equivalent to the detection device in this embodiment, the data behavior training module is equivalent to the correspondence table forming module 204 and is used for constructing a detection model, the neighborhood context determination module is equivalent to the target neighborhood shared context determination module 205, and the data anomaly detection module is equivalent to the anomaly detection module 207.
Since the device described in the second embodiment of the present invention is a device used for implementing the data anomaly detection method in the internet of things big data environment in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, those skilled in the art can understand the specific structure and deformation of the device, and thus, details are not described herein. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.
EXAMPLE III
Based on the same inventive concept, the present application further provides a computer-readable storage medium 300, please refer to fig. 6, on which a computer program 311 is stored, which when executed implements the method in the first embodiment.
Because the computer-readable storage medium introduced in the third embodiment of the present invention is a computer-readable storage medium used for implementing the data anomaly detection method in the big data environment of the internet of things in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, persons skilled in the art can understand the specific structure and deformation of the computer-readable storage medium, and therefore, details are not described here. Any computer readable storage medium used in the method of the first embodiment of the present invention falls within the intended scope of the present invention.
Example four
Based on the same inventive concept, the present application further provides a computer device, please refer to fig. 7, which includes a storage 401, a processor 402, and a computer program 403 stored in the storage and running on the processor, and when the processor 402 executes the above program, the method in the first embodiment is implemented.
Since the computer device introduced in the fourth embodiment of the present invention is a computer device used for implementing the data anomaly detection method in the internet of things big data environment in the first embodiment of the present invention, based on the method introduced in the first embodiment of the present invention, persons skilled in the art can understand the specific structure and deformation of the computer device, and thus details are not described here. All the computer devices used in the method in the first embodiment of the present invention are within the scope of the present invention.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (8)

1. A data anomaly detection method under an Internet of things big data environment is characterized by comprising the following steps:
step S1: classifying the Internet of things equipment to be detected, and defining context attributes for each type of equipment, wherein each context attribute corresponds to one behavior mode of the type of equipment;
step S2: acquiring Internet of things data generated when each type of equipment operates in each context attribute, and calculating a probability matrix of the context attribute corresponding to each type of equipment by using a preset probability matrix algorithm so as to extract the characteristics of each type of equipment;
step S3: obtaining a context shared by all the devices in the neighborhood range according to all the context attributes of all the types of devices, and taking the context as a neighborhood shared context;
step S4: forming a neighborhood-equipment context corresponding table according to the context shared by all equipment in the neighborhood range, wherein the corresponding table comprises a probability matrix;
step S5: calculating the possibility that the neighborhood where the equipment to be detected is located in each neighborhood sharing context, and determining a target neighborhood sharing context corresponding to the neighborhood based on the calculated possibility;
step S6: according to the determined target neighborhood sharing context, loading a probability matrix corresponding to the equipment to be detected from a neighborhood-equipment context corresponding table;
step S7: based on the loaded probability matrix, performing anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm;
wherein, step S5 specifically includes:
calculating the possibility that the neighborhood where the equipment to be detected is located is in each neighborhood sharing context by adopting a preset probability matrix algorithm according to the neighborhood-equipment context corresponding table;
taking the context with the highest possibility as a target neighborhood sharing context;
the calculation method of the possibility that the neighborhood where the equipment to be detected is located shares the context in each neighborhood comprises the following steps:
min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…
wherein i represents the serial number of the shared context in the neighborhood, a, B and C represent the number of the equipment A, the equipment B and the equipment C in the neighborhood, D represents a function for calculating the Euclidean distance between two matrixes, and SAiProbability matrix, S, representing devices A corresponding to the neighborhood sharing context attribute iBiProbability matrix of device B corresponding to neighborhood sharing context attribute i, SCiProbability matrix, M, representing devices C corresponding to the neighborhood sharing context attribute iAMean value of probability matrix representing all class A devices, MBMean value of probability matrix representing all B-class devices, MCRepresents the average of the probability matrices for all class C devices.
2. The method according to claim 1, wherein step S2 specifically comprises:
training different contexts of different kinds of equipment separately and independently;
collecting normal operation data of all conditions through a data collector, and dividing the normal operation data into no more than 10 segments according to the value range of the normal operation data, wherein a symbol representing the segment to which the data belongs is used for representing one data;
forming a sequence taking time as a dimension on the acquired data, converting the sequence into a symbol sequence, and defining a sliding window W with a fixed size of n to move along the time flow direction;
at each moment, a character sequence with the length of n exists in the sliding window, the number of double character pairs with the distance of 1 to n-1 is counted, a feature matrix with the type of the double character pairs as rows and the adjacent distance of 1 to n-1 as columns is created, and the counted number of the sliding window in the moving process is recorded in the feature matrix;
and normalizing the matrix obtained by the calculation mode according to columns to obtain the probability of each character pair at the distance, and taking the probability as a corresponding probability matrix.
3. The method according to claim 2, wherein step S3 specifically comprises:
comprehensively analyzing all context attributes, directly combining the unrelated contexts, reducing the repeated parts of the related contexts and then combining the related contexts to form a neighborhood sharing context containing all the contexts of all the equipment.
4. The method according to claim 1, wherein step S7 specifically comprises:
after converting the data of the equipment to be detected into a character sequence, defining sliding windows W with equal size n, and calculating the probability of the character sequence in the sliding windows;
and comparing the calculated probability value with a set threshold value p, marking the abnormal time if the probability value is less than the threshold value, continuing sliding the window, and detecting that the data at the time is abnormal if k abnormal times continuously appear.
5. The method of claim 1, wherein the step of determining the target neighborhood sharing context is performed every predetermined period when performing anomaly detection.
6. The utility model provides a data anomaly detection device under thing networking big data environment which characterized in that includes:
the context attribute definition module is used for classifying the Internet of things equipment to be detected and defining context attributes for each type of equipment, wherein each context attribute corresponds to one behavior mode of the type of equipment;
the device feature extraction module is used for acquiring Internet of things data generated when each type of device operates in each context attribute, calculating a probability matrix of the context attribute corresponding to each type of device by using a preset probability matrix algorithm, and extracting features of each type of device;
the neighborhood sharing context obtaining module is used for obtaining the context shared by all the devices in the neighborhood range according to all the context attributes of all the types of devices, and taking the context as the neighborhood sharing context;
a correspondence table forming module, configured to form a neighborhood-device context correspondence table according to a context shared by all devices in a neighborhood range, where the correspondence table includes a probability matrix;
the target neighborhood sharing context determining module is used for calculating the possibility that the neighborhood where the equipment to be detected is located in each neighborhood sharing context, and determining the target neighborhood sharing context corresponding to the neighborhood based on the calculated possibility;
the probability matrix loading module is used for loading a probability matrix corresponding to the equipment to be detected from the neighborhood-equipment context corresponding table according to the determined target neighborhood sharing context;
the anomaly detection module is used for carrying out anomaly detection on the data of each device to be detected by adopting a preset probability matrix algorithm based on the loaded probability matrix;
the target neighborhood sharing context determining module is specifically configured to:
calculating the possibility that the neighborhood where the equipment to be detected is located is in each neighborhood sharing context by adopting a preset probability matrix algorithm according to the neighborhood-equipment context corresponding table;
taking the context with the highest possibility as a target neighborhood sharing context;
the calculation method of the possibility that the neighborhood where the equipment to be detected is located shares the context in each neighborhood comprises the following steps:
min{P(i)}
P(i)=a*D(MA,SAi)+b*D(MB,SBi)+c*D(MC,SCi)+…
wherein i represents the serial number of the shared context in the neighborhood, a, B and C represent the number of the devices A, B and C in the neighborhood, and D represents the calculation of the Europe between the two matrixesFunction of formula distance, SAiProbability matrix, S, representing devices A corresponding to the neighborhood sharing context attribute iBiProbability matrix of device B corresponding to neighborhood sharing context attribute i, SCiProbability matrix, M, representing devices C corresponding to the neighborhood sharing context attribute iAMean value of probability matrix representing all class A devices, MBMean value of probability matrix representing all B-class devices, MCRepresents the average of the probability matrices for all class C devices.
7. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 5.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the program.
CN201910318526.7A 2019-04-19 2019-04-19 Data anomaly detection method and device under Internet of things big data environment Active CN110086860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910318526.7A CN110086860B (en) 2019-04-19 2019-04-19 Data anomaly detection method and device under Internet of things big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910318526.7A CN110086860B (en) 2019-04-19 2019-04-19 Data anomaly detection method and device under Internet of things big data environment

Publications (2)

Publication Number Publication Date
CN110086860A CN110086860A (en) 2019-08-02
CN110086860B true CN110086860B (en) 2020-09-08

Family

ID=67415653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910318526.7A Active CN110086860B (en) 2019-04-19 2019-04-19 Data anomaly detection method and device under Internet of things big data environment

Country Status (1)

Country Link
CN (1) CN110086860B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942256B (en) * 2019-12-02 2020-12-04 清华四川能源互联网研究院 New energy plant station network-involved end real-time interaction process abnormity detection method and system
CN114996318B (en) * 2022-07-12 2022-11-04 成都唐源电气股份有限公司 Automatic judgment method and system for processing mode of abnormal value of detection data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831442A (en) * 2011-06-13 2012-12-19 索尼公司 Abnormal behavior detection method and equipment and method and equipment for generating abnormal behavior detection equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3113529B1 (en) * 2015-06-29 2020-09-16 Argus Cyber Security Ltd. System and method for time based anomaly detection in an in-vehicle communication network
US10530795B2 (en) * 2017-03-17 2020-01-07 Target Brands, Inc. Word embeddings for anomaly classification from event logs
CN108108253A (en) * 2017-12-26 2018-06-01 北京航空航天大学 A kind of abnormal state detection method towards multiple data stream
CN108668303B (en) * 2018-05-15 2021-08-10 上海兆祥邮轮科技集团股份有限公司 Incremental outlier detection method for wireless sensor network data stream

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831442A (en) * 2011-06-13 2012-12-19 索尼公司 Abnormal behavior detection method and equipment and method and equipment for generating abnormal behavior detection equipment

Also Published As

Publication number Publication date
CN110086860A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN117113262B (en) Network traffic identification method and system
CN113569756B (en) Abnormal behavior detection and positioning method, system, terminal equipment and readable storage medium
CN110086860B (en) Data anomaly detection method and device under Internet of things big data environment
CN110300127A (en) A kind of network inbreak detection method based on deep learning, device and equipment
CN112597928B (en) Event detection method and related device
CN112818871B (en) Target detection method of full fusion neural network based on half-packet convolution
CN109726737B (en) Track-based abnormal behavior detection method and device
CN110955651A (en) Motor fault data enhancement method based on deep convolution generation type countermeasure network
CN110097120B (en) Network flow data classification method, equipment and computer storage medium
CN108830882A (en) Video abnormal behaviour real-time detection method
CN114726802A (en) Network traffic identification method and device based on different data dimensions
CN114970694A (en) Network security situation assessment method and model training method thereof
CN115967972A (en) Network anomaly detection method and device, electronic equipment and storage medium
CN111803956B (en) Method and device for determining game plug-in behavior, electronic equipment and storage medium
CN117333795A (en) River surface flow velocity measurement method and system based on screening post-treatment
CN112422546A (en) Network anomaly detection method based on variable neighborhood algorithm and fuzzy clustering
CN114884704B (en) Network traffic abnormal behavior detection method and system based on involution and voting
CN110866470A (en) Face anti-counterfeiting detection method based on random image characteristics
CN114124565B (en) Network intrusion detection method based on graph embedding
CN113554685A (en) Method and device for detecting moving target of remote sensing satellite, electronic equipment and storage medium
CN110443244B (en) Graphics processing method and related device
CN112884069A (en) Method for detecting confrontation network sample
Sun et al. Visual analytics for anomaly classification in LAN based on deep convolutional neural network
CN115408182A (en) Service system fault positioning method and device
CN111597934A (en) System and method for processing training data for statistical applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant