CN115348190A - Internet of things equipment detection method, system and equipment - Google Patents

Internet of things equipment detection method, system and equipment Download PDF

Info

Publication number
CN115348190A
CN115348190A CN202210989067.7A CN202210989067A CN115348190A CN 115348190 A CN115348190 A CN 115348190A CN 202210989067 A CN202210989067 A CN 202210989067A CN 115348190 A CN115348190 A CN 115348190A
Authority
CN
China
Prior art keywords
sample set
abnormal
training
abnormal sample
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210989067.7A
Other languages
Chinese (zh)
Inventor
葛航
李守位
卢文科
殷亭
李海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Huatai United Industrial Automation Co ltd
Original Assignee
Zhengzhou Huatai United Industrial Automation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Huatai United Industrial Automation Co ltd filed Critical Zhengzhou Huatai United Industrial Automation Co ltd
Priority to CN202210989067.7A priority Critical patent/CN115348190A/en
Publication of CN115348190A publication Critical patent/CN115348190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level

Abstract

The invention discloses a method, a system and equipment for detecting equipment of the Internet of things, wherein the method comprises the following steps: preprocessing a high-dimensional data set of each Internet of things device to obtain a normal sample set and a first abnormal sample set; generating random noise into an extended abnormal sample by using a generator in a deep convolution countermeasure network, and performing repeated cross countermeasure training on the generator and a discriminator to obtain a second abnormal sample set; inputting the training sample set into a classification regression tree for classification training to obtain an anomaly detection model; and carrying out anomaly detection on the to-be-detected Internet of things based on the anomaly detection model. According to the method, the second abnormal sample set is used for carrying out data expansion on the abnormal samples, so that the problem that the number of the abnormal samples is small, so that the positive and negative samples are unbalanced is solved, and the training precision of the model is improved; the training sample set is input into the classification regression tree for classification training, so that the abnormal behaviors of the equipment of the Internet of things can be classified and detected, and the use safety of the equipment is improved.

Description

Internet of things equipment detection method, system and equipment
Technical Field
The invention belongs to the technical field of Internet of things, and particularly relates to a method, a system and equipment for detecting Internet of things equipment.
Background
The internet of things refers to the fact that task objects are connected with a network through information sensing equipment according to an agreed protocol, and information interaction and communication are conducted through information transmission media. The purpose of thing networking is realized not being restricted by place, time, long-term quick connection, like the removal thing networking device of application environment such as wisdom house, wisdom city, wisdom traffic. The arrival of the 5G era provides great support for the development of the technology of the Internet of things, the specification of the 5G standard can well meet the requirements of the Internet of things, including network speed, capacity, safety and the like, and the development of the industrial Internet of things is promoted.
With the continuous development of the internet of things, the data volume is increased rapidly due to massive device data and user data, and the requirements of high capacity, ultralow delay, better service quality and better user experience for the end user are provided for the internet of things device. However, the original data are mostly regarded as normal data in the conventional anomaly detection algorithm, and the problem of data imbalance is solved by neglecting the abnormal data, so that the conventional method cannot train an anomaly detection model by using the abnormal data, and the detection result is not accurate enough.
Disclosure of Invention
The invention aims to provide a method, a system and equipment for detecting equipment of the Internet of things, which are used for solving the technical problem that the detection result is not accurate due to the fact that an abnormal data cannot be used for training an abnormal detection model in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, a method for detecting internet of things equipment is provided, including:
acquiring a high-dimensional data set of each internet of things device, and preprocessing the high-dimensional data set to obtain a normal sample set and a first abnormal sample set;
introducing random noise which is generated in the sampling process and accords with Gaussian distribution, generating the random noise into an extended abnormal sample by using a generator in a deep convolution countermeasure network, inputting the extended abnormal sample and the first abnormal sample set into a discriminator in the deep convolution countermeasure network for discrimination, and performing repeated cross countermeasure training on the generator and the discriminator to obtain a second abnormal sample set;
integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, inputting the training sample set into a classification regression tree for classification training, and obtaining an abnormal detection model;
and carrying out anomaly detection on the equipment of the Internet of things to be detected based on the anomaly detection model.
In one possible design, the high-dimensional data set includes a high-dimensional time series data set including values corresponding to data characteristic attributes including at least time, device identification, device type, device location, and device operational data, wherein the device operational data includes device traffic data.
In one possible design, the high-dimensional dataset is preprocessed, including:
and carrying out normalization processing on the high-dimensional time sequence data set by adopting a min-max method to realize data standardization, carrying out dimensionality reduction processing on the standardized high-dimensional time sequence data set by adopting a principal component analysis method, and carrying out cluster analysis on the dimensionality reduced time sequence data set by adopting a K-mean algorithm.
In one possible design, after performing the normalization process, the dimensionality reduction process, and the cluster analysis on the high-dimensional time-series dataset, the method further includes:
and performing sliding window processing on the time sequence data set after the clustering analysis at least twice so as to increase the correlation between dimensions of the abnormal sample data and the correlation between the abnormal sample data and time.
In one possible design, inputting the extended abnormal samples and the first abnormal sample set into a discriminator in a deep convolutional countermeasure network for discrimination, and performing repeated cross countermeasure training on a generator and the discriminator to obtain a second abnormal sample set, including:
step (1): inputting the extended abnormal sample and the first abnormal sample set into a discriminator so that the discriminator distinguishes the extended abnormal sample and the first abnormal sample;
step (2): updating the parameters of the discriminator by adopting a random gradient ascent method until the training capacity of the discriminator reaches a preset standard, and updating the parameters of the generator by adopting a random gradient descent method so that the generator generates a new expansion abnormal sample by utilizing the mapped multidimensional data;
and (3): repeating the step (1) and the step (2), and performing repeated cross training on the discriminator and the generator until the discriminator cannot accurately discriminate the expansion abnormal sample and the first abnormal sample, so as to finish the training of the generator;
and (4): and taking the expanded abnormal sample output when the generator finishes training as a second abnormal sample set.
In one possible design, when the discriminator cannot accurately discriminate the extended abnormal sample and the first abnormal sample, the deep convolution countermeasure network is shown to reach data balance, and the target function of the deep convolution countermeasure network is expressed as follows:
Figure 557007DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 211848DEST_PATH_IMAGE002
the presence of the discriminator is indicated by the expression,
Figure 916499DEST_PATH_IMAGE003
a representation generator for generating a representation of the object,
Figure 723918DEST_PATH_IMAGE004
an objective function representing a deep convolutional countermeasure network,
Figure 957453DEST_PATH_IMAGE005
a first sample of the anomaly is represented,
Figure 335476DEST_PATH_IMAGE006
it is shown that the augmentation of the abnormal sample,
Figure 312660DEST_PATH_IMAGE007
the output of the discriminator, i.e. the sample result of the discrimination,
Figure 709006DEST_PATH_IMAGE008
the representation generator output, i.e. the extended anomaly samples generated by the random noise,
Figure 113442DEST_PATH_IMAGE009
a data distribution representing a first set of outlier samples,
Figure 962450DEST_PATH_IMAGE010
representing the distribution of data that augments the anomalous samples.
In one possible design, before obtaining the second set of anomaly samples, the method further includes:
and adding characteristic contraction constraint to the generator when training is finished so as to punish and expand sample points in the abnormal sample, wherein the distribution deviation of the abnormal sample and the preset sample exceeds a threshold value.
In one possible design, the integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, and inputting the training sample set into a classification regression tree for classification training, includes:
b, performing step A: integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, and calculating Lorenz coefficients of the training sample set;
and B: taking a characteristic value of each data characteristic in a training sample set, dividing the training sample set into two subsets according to the characteristic value, and calculating a Lorenz coefficient of each data characteristic after dividing the training sample set into the two subsets;
and C: selecting the data characteristic with the minimum Lorenz coefficient as an optimal characteristic, generating a binary tree by taking the corresponding characteristic value as a segmentation point, and distributing each training sample to two child nodes;
step D: and C, recursively calling the steps A to C for the two child nodes until the Lorenz coefficient of the training sample set is smaller than a threshold value, and generating a corresponding classification regression tree.
A second aspect provides an internet of things device detection system, including:
the data processing module is used for acquiring a high-dimensional data set of each Internet of things device and preprocessing the high-dimensional data set to obtain a normal sample set and a first abnormal sample set;
the sample expansion module is used for introducing random noise which is generated in the sampling process and accords with Gaussian distribution, generating the random noise into an expansion abnormal sample by using a generator in the deep convolution countermeasure network, inputting the expansion abnormal sample and the first abnormal sample set into a discriminator in the deep convolution countermeasure network for discrimination, and performing repeated cross countermeasure training on the generator and the discriminator to obtain a second abnormal sample set;
the classification training module is used for integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, inputting the training sample set into a classification regression tree for classification training, and obtaining an abnormal detection model;
and the anomaly detection module is used for carrying out anomaly detection on the Internet of things equipment to be detected based on the anomaly detection model.
In a third aspect, the present invention provides a computer device, including a memory, a processor and a transceiver, which are sequentially connected in a communication manner, where the memory is used to store a computer program, the transceiver is used to transmit and receive a message, and the processor is used to read the computer program and execute the method for detecting a device of the internet of things as described in any one of the possible designs of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores instructions for executing the method for detecting the internet of things device as set forth in any one of the possible designs of the first aspect when the instructions are run on a computer.
In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method for detecting devices of the internet of things as set forth in any one of the possible designs of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
random noise in a sampling process is introduced, a generator in a deep convolution countermeasure network is used for generating the random noise into an expansion abnormal sample, the expansion abnormal sample and the first abnormal sample set are input into a discriminator in the deep convolution countermeasure network for discrimination, and the generator and the discriminator are subjected to repeated cross countermeasure training to obtain a second abnormal sample set, so that data expansion is performed on the abnormal sample by using the second abnormal sample set, the problem of unbalance of positive and negative samples caused by small number of abnormal samples is solved, and the training precision of a model is improved; the training sample set is input into the classification regression tree for classification training, so that the abnormal behaviors of the equipment of the Internet of things can be classified and detected, and the use safety of the equipment is improved.
Drawings
Fig. 1 is a flowchart of an internet of things device detection method in an embodiment of the present invention;
fig. 2 is a block diagram of a structure of an internet of things device detection system in the embodiment of the present invention.
Fig. 3 is a block diagram of a computer device in the embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the embodiments or the description in the prior art, it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts. It should be noted that the description of the embodiments is provided to help understanding of the present invention, and the present invention is not limited thereto.
Examples
In order to solve the technical problem that an abnormal detection model cannot be trained by using abnormal data and a detection result is not accurate enough in the prior art, the embodiment of the application provides an internet of things equipment detection method, the method uses a second abnormal sample set to perform data expansion on abnormal samples, and the problem that positive and negative samples are unbalanced due to the small number of the abnormal samples is solved, so that the training precision of the model is improved; the training sample set is input into the classification regression tree for classification training, so that the abnormal behaviors of the equipment of the Internet of things can be classified and detected, and the use safety of the equipment is improved.
The method for detecting the internet of things device provided by the embodiment of the application is described in detail below.
It should be noted that the method for detecting the device based on the internet of things provided by the embodiment of the present application can be applied to a remote monitoring terminal or a cloud server to perform anomaly detection on an abnormal behavior of the device based on the internet of things, where the remote terminal device includes, but is not limited to, an industrial control computer, a personal computer, a mobile terminal, and the like, and is not limited herein. The flow and the principle of the detection method of the internet of things are described below in the embodiment of the application, so that the technical scheme is clear.
For convenience of description, the embodiments of the present application are described with a cloud server as an execution subject, except for specific descriptions. It is to be understood that the executing entity is not limited to the embodiment of the present application, and in some other embodiments, other terminal devices may be used as the executing entity.
As shown in fig. 1, the method for detecting the internet of things device provided in the embodiment of the present application is a flowchart, and the method for detecting the internet of things device includes, but is not limited to, the following steps S1 to S4:
s1, acquiring a high-dimensional data set of each Internet of things device, and preprocessing the high-dimensional data set to obtain a normal sample set and a first abnormal sample set;
the method comprises the following steps that the Internet of things equipment generates massive data in the operation process, unpredictable abnormal data are generated at the same time, and the abnormal state is not normal, so that the positive and negative samples are unbalanced; based on this, the embodiment needs to preprocess the mass data of the internet of things to meet the input requirement of the subsequent training model. The high-dimensional dataset in this embodiment may be an existing dataset, such as a CRAWDAD dataset, a KDDCUP99 dataset, an NSL-KDD dataset, an UNSW-NB15 dataset, a PREDICT dataset, a CAIDA dataset, a DEFCON dataset, an ADFA IDS dataset, a KYOTO dataset, an ISCX 2012 dataset, and an ICS dataset, and the like, which is not limited herein; of course, it is understood that the high-dimensional data set in this embodiment may also be internet of things device data captured from the edge gateway in real time, such as traffic data.
In step S1, the high-dimensional data set includes a high-dimensional time series data set, where the high-dimensional time series data set includes values corresponding to characteristic attributes of each data, and each characteristic attribute of the data at least includes time, an equipment identifier (such as an equipment ID), an equipment type, an equipment location, and equipment operation data, where the equipment operation data includes equipment traffic data, and may also be longitude and latitude, a read/write state, an ingress traffic, an egress traffic, a used memory, a total memory capacity, a CPU utilization (percentile), and the like of the internet-of-things equipment, and details thereof are not repeated here.
It should be noted that, in this embodiment, an ETL (Extract Transform Load ) may be used to obtain a high-dimensional data set of each internet of things device, where ETL is a conventional step before data analysis, and the specific principle is not described herein again; preferably, in the embodiment of the present application, python (or a programming language such as R, julia) is used to extract high-dimensional data of the internet of things device in an offline manner (for example, by reading a csv format file) or an online manner (for example, by using database interfaces such as MySQL, elastic search, and Hive), where if data of the internet of things device needs to be acquired in an online manner, databases such as MySQL, elastic search, and Hive need to be connected in advance; and processing the data to obtain corresponding time sequence data, wherein each row is a complete time sequence data, and each column is a value corresponding to any data characteristic.
In step S1, the high-dimensional dataset is preprocessed, including:
and carrying out normalization processing on the high-dimensional time sequence data set by adopting a min-max method to realize data standardization, carrying out dimensionality reduction processing on the standardized high-dimensional time sequence data set by adopting a principal component analysis method, and carrying out cluster analysis on the dimensionality reduced time sequence data set by adopting a K-mean algorithm.
It should be noted that the normalization processing is performed on the high-dimensional time series data set by using a min-max method to realize data standardization, and the method specifically includes: the method comprises the steps of obtaining the maximum attribute value, the minimum attribute value and the specific attribute value of each sample of the same data characteristic attribute of all Internet of things equipment, determining the standardized attribute value of each sample attribute according to the maximum attribute value, the minimum attribute value and the specific attribute value of each sample, and carrying out normalization calculation on each sample based on the standardized attribute values, wherein the normalization formula is a conventional algorithm formula and principle, and details are omitted here.
It should be noted that, because the internet of things device has high-dimensional mass data, but data of each dimension is not useful data for device anomaly detection, in order to reduce the complexity of calculation and provide detection accuracy, in this embodiment, the high-dimensional time series data needs to be subjected to dimensionality reduction processing to retain feature data with importance higher than a threshold, specifically, a principal component analysis method is adopted to perform dimensionality reduction processing on a standardized high-dimensional time series dataset, specifically, a covariance matrix is solved through centralization, eigenvalues of the covariance matrix and eigenvectors corresponding to each eigenvalue are determined, a projection space dimensionality is obtained, a maximum eigenvalue corresponding to the projection space dimensionality is selected according to the projection space dimensionality, a projection matrix is constructed according to the eigenvector corresponding to the maximum eigenvalue, and the projection matrix and the dataset are mapped to obtain the time series data after dimensionality reduction.
It should be noted that, in order to implement the preliminary classification of the time series data after the dimension reduction, the data set is preliminarily classified by the K-means algorithm in this embodiment to obtain a normal sample set and a first abnormal sample set, where the first abnormal sample set is the real abnormal data in the existing data of the internet of things. However, because the data volumes of the normal sample set and the first abnormal sample set are seriously unbalanced, in order to avoid the problem of insufficient training caused by the imbalance of positive and negative samples during model training, the present embodiment proposes step S2 to solve the problem, and the step S2 will be described in detail below.
Preferably, after the normalization processing, the dimension reduction processing and the cluster analysis are performed on the high-dimensional time series data set, the method further includes:
and performing sliding window processing on the time sequence data set after the clustering analysis at least twice to increase the correlation between dimensions of the abnormal sample data and the correlation between the abnormal sample data and time, thereby improving the accuracy of the data in subsequent training.
S2, introducing random noise which is generated in the sampling process and accords with Gaussian distribution, generating the random noise into an extended abnormal sample by using a generator in a deep convolution countermeasure network, inputting the extended abnormal sample and the first abnormal sample set into a discriminator in the deep convolution countermeasure network for discrimination, and performing repeated cross countermeasure training on the generator and the discriminator to obtain a second abnormal sample set;
it should be noted that the deep convolution countermeasure network in this embodiment includes a generator and a discriminator, and game training between the generator and the discriminator can improve training capabilities of both the generator and the discriminator, so that a simulated real abnormal sample is generated by the generator, and the discriminator is used to discriminate the first abnormal sample and the simulated extended sample, thereby determining whether input data is the first abnormal sample or the simulated extended sample of the generator, and optimizing parameters of the discriminator and the generator according to a discrimination result, thereby obtaining an optimal second abnormal sample set.
In step S2, inputting the extended abnormal sample and the first abnormal sample set into a discriminator in a deep convolutional countermeasure network for discrimination, and performing repeated cross countermeasure training on a generator and the discriminator to obtain a second abnormal sample set, including:
step (1): inputting the extended abnormal sample and the first abnormal sample set into a discriminator so that the discriminator distinguishes the extended abnormal sample and the first abnormal sample;
step (2): updating the parameters of the discriminator by adopting a random gradient ascending method until the training capability of the discriminator reaches a preset standard, and updating the parameters of the generator by adopting a random gradient descending method so that the generator generates a new expansion abnormal sample by utilizing the mapped multidimensional data;
and (3): repeating the step (1) and the step (2), performing repeated cross training on the discriminator and the generator until the discriminator cannot accurately discriminate the expansion abnormal sample and the first abnormal sample, and finishing training of the generator;
and (4): and taking the extended abnormal sample output by the generator when the training is finished as a second abnormal sample set.
In one possible design, when the discriminator cannot accurately discriminate the extended abnormal sample and the first abnormal sample, the deep convolution countermeasure network is shown to reach data balance, and the target function of the deep convolution countermeasure network is expressed as follows:
Figure 992591DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 509023DEST_PATH_IMAGE002
the presence of the discriminator is indicated by the expression,
Figure 349940DEST_PATH_IMAGE003
a representation generator for generating a representation of the object,
Figure 951823DEST_PATH_IMAGE004
represents the objective function of the deep convolutional counterpoise network,
Figure 270809DEST_PATH_IMAGE005
a first sample of the anomaly is represented,
Figure 923638DEST_PATH_IMAGE006
it is shown that the augmentation of the abnormal sample,
Figure 669877DEST_PATH_IMAGE007
the output of the discriminator, i.e. the sample result of the discrimination,
Figure 227898DEST_PATH_IMAGE008
the representation generator output, i.e. the extended anomaly samples generated by the random noise,
Figure 350574DEST_PATH_IMAGE009
a data distribution representing a first set of outlier samples,
Figure 310440DEST_PATH_IMAGE010
representing the distribution of data that augments the anomalous samples.
In a specific embodiment, before obtaining the second abnormal sample set, the method further comprises:
and adding characteristic contraction constraint to the generator when training is finished so as to punish and expand sample points in the abnormal sample, wherein the distribution deviation of the abnormal sample and the preset sample exceeds a threshold value.
Before the generator outputs the second abnormal sample set, contraction constraint is added into the characteristic distribution to punish and expand sample points, which have deviation exceeding a threshold value from the preset sample distribution, in the abnormal samples, so that the sample points which do not accord with the sample distribution are removed, the influence of non-characteristic sample points on the second abnormal sample set is minimized, and the accuracy of output data is improved.
S3, integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, inputting the training sample set into a classification regression tree for classification training, and obtaining an abnormal detection model;
it should be noted that, because the classification regression tree can make a corresponding classification rule for the behavior attack on the internet of things device, the classification accuracy is improved, and the classification of different behavior attacks is completed while the abnormal behavior is detected. Specifically, the classification regression tree classifies behavior attacks by using Lorenz coefficients, and the smaller the Lorenz coefficient is, the higher the accuracy of the sample is.
In step S3, integrating the normal sample set, the first abnormal sample set, and the second abnormal sample set into a training sample set, and inputting the training sample set into a classification regression tree for classification training, including:
b, performing step A: integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, and calculating Lorenz coefficients of the training sample set;
and B: taking a characteristic value of each data characteristic in a training sample set, dividing the training sample set into two subsets according to the characteristic value, and calculating a Lorenz coefficient of each data characteristic after dividing the training sample set into the two subsets;
and C: selecting the data characteristic with the minimum Lorenz coefficient as an optimal characteristic, generating a binary tree by taking the corresponding characteristic value as a segmentation point, and distributing each training sample to two child nodes;
step D: and C, recursively calling the steps A to C for the two child nodes until the Lorenz coefficient of the training sample set is smaller than a threshold value, and generating a corresponding classification regression tree.
And S4, carrying out anomaly detection on the Internet of things equipment to be detected based on the anomaly detection model.
Based on the disclosure, the random noise in the sampling process is introduced, the generator in the deep convolution countermeasure network is used for generating the random noise into the extended abnormal sample, the extended abnormal sample and the first abnormal sample set are input into the discriminator in the deep convolution countermeasure network for discrimination, and the generator and the discriminator are subjected to repeated cross countermeasure training to obtain a second abnormal sample set, so that the second abnormal sample set is used for performing data expansion on the abnormal sample, the problem of unbalance of positive and negative samples caused by small number of the abnormal samples is solved, and the training precision of the model is improved; the training sample set is input into the classification regression tree for classification training, so that the abnormal behaviors of the equipment of the Internet of things can be classified and detected, and the use safety of the equipment is improved.
As shown in fig. 2, a second aspect provides an internet of things device detection system, including:
the data processing module is used for acquiring a high-dimensional data set of each internet of things device and preprocessing the high-dimensional data set to obtain a normal sample set and a first abnormal sample set;
the sample expansion module is used for introducing random noise which is generated in the sampling process and accords with Gaussian distribution, generating the random noise into an expansion abnormal sample by using a generator in the deep convolution countermeasure network, inputting the expansion abnormal sample and the first abnormal sample set into a discriminator in the deep convolution countermeasure network for discrimination, and performing repeated cross countermeasure training on the generator and the discriminator to obtain a second abnormal sample set;
the classification training module is used for integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, inputting the training sample set into a classification regression tree for classification training, and obtaining an abnormal detection model;
and the anomaly detection module is used for carrying out anomaly detection on the equipment of the Internet of things to be detected based on the anomaly detection model.
In one possible design, the high-dimensional data set includes a high-dimensional time series data set including values corresponding to data characteristic attributes including at least time, device identification, device type, device location, and device operational data, wherein the device operational data includes device traffic data.
In one possible design, the data processing module is specifically configured to, in preprocessing the high-dimensional dataset:
and carrying out normalization processing on the high-dimensional time sequence data set by adopting a min-max method to realize data standardization, carrying out dimensionality reduction processing on the standardized high-dimensional time sequence data set by adopting a principal component analysis method, and carrying out cluster analysis on the dimensionality reduced time sequence data set by adopting a K-mean algorithm.
In one possible design, the data processing module is further configured to:
and performing sliding window processing at least twice on the time series data set after the clustering analysis so as to increase the correlation between each dimension of the abnormal sample data and the correlation between the abnormal sample data and time.
In one possible design, the sample expansion module is specifically configured to:
step (1): inputting the extended abnormal sample and the first abnormal sample set into a discriminator so that the discriminator distinguishes the extended abnormal sample and the first abnormal sample;
step (2): updating the parameters of the discriminator by adopting a random gradient ascending method until the training capability of the discriminator reaches a preset standard, and updating the parameters of the generator by adopting a random gradient descending method so that the generator generates a new expansion abnormal sample by utilizing the mapped multidimensional data;
and (3): repeating the step (1) and the step (2), and performing repeated cross training on the discriminator and the generator until the discriminator cannot accurately discriminate the expansion abnormal sample and the first abnormal sample, so as to finish the training of the generator;
and (4): and taking the extended abnormal sample output by the generator when the training is finished as a second abnormal sample set.
In one possible design, when the discriminator cannot accurately discriminate the extended abnormal sample from the first abnormal sample, the deep convolution countermeasure network is indicated to reach data balance, and the target function of the deep convolution countermeasure network is expressed as follows:
Figure 476848DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 787744DEST_PATH_IMAGE012
the presence of the discriminator is indicated by the expression,
Figure 448532DEST_PATH_IMAGE003
a representation generator for generating a representation of the object,
Figure 262905DEST_PATH_IMAGE013
an objective function representing a deep convolutional countermeasure network,
Figure 616526DEST_PATH_IMAGE005
a first sample of the anomaly is represented,
Figure 149138DEST_PATH_IMAGE006
indicating that the sample of the anomaly is being extended,
Figure 613617DEST_PATH_IMAGE007
the output of the discriminator, i.e. the sample result of the discrimination,
Figure 282496DEST_PATH_IMAGE008
the representation generator output, i.e. the extended anomaly samples generated by the random noise,
Figure 197231DEST_PATH_IMAGE009
a data distribution representing a first set of outlier samples,
Figure 748298DEST_PATH_IMAGE010
representing the distribution of data that augments the anomalous samples.
In one possible design, the system further includes:
and the constraint optimization module is used for adding feature shrinkage constraint to the generator when the training is finished so as to punish sample points, of the expansion abnormal samples, of which the distribution deviation with the preset sample exceeds a threshold value.
In one possible design, the classification training module is specifically configured to:
b, performing step A: calculating Lorenz coefficients of the training sample set;
and B: taking a characteristic value of each data characteristic in a training sample set, dividing the training sample set into two subsets according to the characteristic value, and calculating the Lorenz coefficient of each data characteristic after dividing into the two subsets;
and C: selecting the data characteristic with the minimum Lorenz coefficient as an optimal characteristic, generating a binary tree by taking the corresponding characteristic value as a segmentation point, and distributing each training sample to two child nodes;
step D: and C, recursively calling the steps A to C for the two child nodes until the Lorenz coefficient of the training sample set is smaller than a threshold value, and generating a corresponding classification regression tree.
For the working process, working details and technical effects of the foregoing apparatus provided in the second aspect of this embodiment, reference may be made to the method described in any one of the first aspect or the first aspect, which is not described herein again.
In a third aspect, as shown in fig. 3, the present invention provides a computer device, including a memory, a processor, and a transceiver, which are sequentially connected in communication, where the memory is used to store a computer program, the transceiver is used to send and receive messages, and the processor is used to read the computer program and perform the method for detecting the internet of things device as set forth in any one of the possible designs of the first aspect.
For example, the Memory may include, but is not limited to, a Random-Access Memory (RAM), a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a First-in First-out (FIFO), and/or a First-in Last-out (FILO), and the like; the processor may not be limited to the use of a microprocessor model number STM32F105 family; the transceiver may be, but is not limited to, a WiFi (wireless fidelity) wireless transceiver, a bluetooth wireless transceiver, a GPRS (General Packet Radio Service) wireless transceiver, and/or a ZigBee (ZigBee protocol, low power local area network protocol based on ieee 802.15.4 standard) wireless transceiver, etc. In addition, the computer device may also include, but is not limited to, a power module, a display screen, and other necessary components.
For the working process, working details and technical effects of the foregoing computer device provided in the third aspect of this embodiment, reference may be made to the method described in the first aspect or any one of the possible designs of the first aspect, which is not described herein again.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores instructions for executing the method for detecting the internet of things device as set forth in any one of the possible designs of the first aspect when the instructions are run on a computer.
The computer-readable storage medium refers to a carrier for storing data, and may include, but is not limited to, floppy disks, optical disks, hard disks, flash memories, flash disks and/or Memory sticks (Memory sticks), etc., and the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
For the working process, the working details and the technical effects of the foregoing computer-readable storage medium provided in the fourth aspect of this embodiment, reference may be made to the method in any one of the above first aspect or the possible designs of the first aspect, and details are not described herein again.
In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method for detecting devices of the internet of things as set forth in any one of the possible designs of the first aspect.
For the working process, the working details and the technical effects of the computer program product containing the instructions provided in the fifth aspect of the present embodiment, reference may be made to the method described in the first aspect or any one of the possible designs of the first aspect, and details are not described herein again.
Finally, it should be noted that: the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An Internet of things equipment detection method is characterized by comprising the following steps:
acquiring a high-dimensional data set of each internet of things device, and preprocessing the high-dimensional data set to obtain a normal sample set and a first abnormal sample set;
introducing random noise which is generated in the sampling process and accords with Gaussian distribution, generating the random noise into an extended abnormal sample by using a generator in a deep convolution countermeasure network, inputting the extended abnormal sample and the first abnormal sample set into a discriminator in the deep convolution countermeasure network for discrimination, and performing repeated cross countermeasure training on the generator and the discriminator to obtain a second abnormal sample set;
integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, inputting the training sample set into a classification regression tree for classification training to obtain an abnormal detection model;
and carrying out anomaly detection on the equipment of the Internet of things to be detected based on the anomaly detection model.
2. The internet of things device detection method of claim 1, wherein the high-dimensional data set comprises a high-dimensional time series data set, the high-dimensional time series data set comprises values corresponding to data characteristic attributes, each data characteristic attribute comprises at least time, device identification, device type, device location, and device operational data, and wherein the device operational data comprises device traffic data.
3. The method for detecting the internet of things equipment as claimed in claim 2, wherein preprocessing the high-dimensional dataset comprises:
and carrying out normalization processing on the high-dimensional time sequence data set by adopting a min-max method to realize data standardization, carrying out dimensionality reduction processing on the standardized high-dimensional time sequence data set by adopting a principal component analysis method, and carrying out cluster analysis on the dimensionality reduced time sequence data set by adopting a K-mean algorithm.
4. The internet of things equipment detection method of claim 3, wherein after the normalization processing, dimension reduction processing and cluster analysis are performed on the high-dimensional time series data set, the method further comprises:
and performing sliding window processing on the time sequence data set after the clustering analysis at least twice to increase the correlation between dimensions of the abnormal sample data and the correlation between the abnormal sample data and time.
5. The method for detecting the internet of things equipment according to claim 1, wherein the expanding abnormal sample and the first abnormal sample set are input into a discriminator in a deep convolutional countermeasure network for discrimination, and a generator and the discriminator are subjected to repeated cross countermeasure training to obtain a second abnormal sample set, and the method comprises the following steps:
step (1): inputting the extended abnormal sample and the first abnormal sample set into a discriminator so that the discriminator distinguishes the extended abnormal sample and the first abnormal sample;
step (2): updating the parameters of the discriminator by adopting a random gradient ascending method until the training capability of the discriminator reaches a preset standard, and updating the parameters of the generator by adopting a random gradient descending method so that the generator generates a new expansion abnormal sample by utilizing the mapped multidimensional data;
and (3): repeating the step (1) and the step (2), and performing repeated cross training on the discriminator and the generator until the discriminator cannot accurately discriminate the expansion abnormal sample and the first abnormal sample, so as to finish the training of the generator;
and (4): and taking the extended abnormal sample output by the generator when the training is finished as a second abnormal sample set.
6. The method for detecting the equipment in the internet of things as claimed in claim 5, wherein when the discriminator cannot discriminate the extended abnormal sample from the first abnormal sample accurately, it indicates that the deep convolutional defense network reaches data balance, and at this time, an objective function of the deep convolutional defense network is expressed as follows:
Figure 220412DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 667573DEST_PATH_IMAGE002
the presence of the discriminator is indicated by the expression,
Figure 388405DEST_PATH_IMAGE003
a representation generator for generating a representation of the object,
Figure 553807DEST_PATH_IMAGE004
represents the objective function of the deep convolutional counterpoise network,
Figure 385497DEST_PATH_IMAGE005
a first sample of the anomaly is represented,
Figure 467591DEST_PATH_IMAGE006
it is shown that the augmentation of the abnormal sample,
Figure 93744DEST_PATH_IMAGE007
the output of the discriminator, i.e. the sample result of the discrimination,
Figure 480863DEST_PATH_IMAGE008
the representation generator output, i.e. the augmented abnormal sample of random noise generation,
Figure 116244DEST_PATH_IMAGE009
a data distribution representing a first set of outlier samples,
Figure 272419DEST_PATH_IMAGE010
representing the distribution of data that augments the anomalous samples.
7. The method for detecting the internet of things equipment according to claim 1, wherein before obtaining the second abnormal sample set, the method further comprises:
and adding characteristic contraction constraint to the generator when training is finished so as to punish and expand sample points in the abnormal sample, wherein the distribution deviation of the abnormal sample and the preset sample exceeds a threshold value.
8. The method for detecting the internet of things equipment according to claim 1, wherein the step of integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set and inputting the training sample set into a classification regression tree for classification training comprises the steps of:
b, mixing the steps of A: integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, and calculating Lorenz coefficients of the training sample set;
and B: taking a characteristic value of each data characteristic in a training sample set, dividing the training sample set into two subsets according to the characteristic value, and calculating a Lorenz coefficient of each data characteristic after dividing the training sample set into the two subsets;
and C: selecting the data characteristic with the minimum Lorenz coefficient as an optimal characteristic, generating a binary tree by taking the corresponding characteristic value as a segmentation point, and distributing each training sample to two child nodes;
step D: and C, recursively calling the steps A to C for the two child nodes until the Lorenz coefficient of the training sample set is smaller than a threshold value, and generating a corresponding classification regression tree.
9. An internet of things equipment detection system, comprising:
the data processing module is used for acquiring a high-dimensional data set of each internet of things device and preprocessing the high-dimensional data set to obtain a normal sample set and a first abnormal sample set;
the sample expansion module is used for introducing random noise which is generated in the sampling process and accords with Gaussian distribution, generating the random noise into an expansion abnormal sample by using a generator in the deep convolution countermeasure network, inputting the expansion abnormal sample and the first abnormal sample set into a discriminator in the deep convolution countermeasure network for discrimination, and performing repeated cross countermeasure training on the generator and the discriminator to obtain a second abnormal sample set;
the classification training module is used for integrating the normal sample set, the first abnormal sample set and the second abnormal sample set into a training sample set, inputting the training sample set into a classification regression tree for classification training, and obtaining an abnormal detection model;
and the anomaly detection module is used for carrying out anomaly detection on the equipment of the Internet of things to be detected based on the anomaly detection model.
10. A computer device, comprising a memory, a processor and a transceiver, which are connected in sequence in communication, wherein the memory is used for storing a computer program, the transceiver is used for sending and receiving messages, and the processor is used for reading the computer program and executing the internet of things device detection method according to any one of claims 1 to 8.
CN202210989067.7A 2022-08-17 2022-08-17 Internet of things equipment detection method, system and equipment Pending CN115348190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210989067.7A CN115348190A (en) 2022-08-17 2022-08-17 Internet of things equipment detection method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210989067.7A CN115348190A (en) 2022-08-17 2022-08-17 Internet of things equipment detection method, system and equipment

Publications (1)

Publication Number Publication Date
CN115348190A true CN115348190A (en) 2022-11-15

Family

ID=83951077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210989067.7A Pending CN115348190A (en) 2022-08-17 2022-08-17 Internet of things equipment detection method, system and equipment

Country Status (1)

Country Link
CN (1) CN115348190A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365648A (en) * 2019-06-14 2019-10-22 东南大学 A kind of vehicle-mounted CAN bus method for detecting abnormality based on decision tree
CN110414601A (en) * 2019-07-30 2019-11-05 南京工业大学 Photovoltaic module method for diagnosing faults, system and equipment based on depth convolution confrontation network
US20200237284A1 (en) * 2019-01-24 2020-07-30 Neeyanth KOPPARAPU System and method for mri image synthesis for the diagnosis of parkinson's disease using deep learning
CN111582596A (en) * 2020-05-14 2020-08-25 公安部交通管理科学研究所 Pure electric vehicle endurance mileage risk early warning method integrating traffic state information
US20210357772A1 (en) * 2020-05-14 2021-11-18 Financial Industry Regulatory Authority, Inc. System and method for time series pattern recognition
CN114462509A (en) * 2022-01-12 2022-05-10 重庆邮电大学 Distributed Internet of things equipment anomaly detection method
CN114820074A (en) * 2022-05-16 2022-07-29 郑州简信软件科技有限公司 Target user group prediction model construction method based on machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200237284A1 (en) * 2019-01-24 2020-07-30 Neeyanth KOPPARAPU System and method for mri image synthesis for the diagnosis of parkinson's disease using deep learning
CN110365648A (en) * 2019-06-14 2019-10-22 东南大学 A kind of vehicle-mounted CAN bus method for detecting abnormality based on decision tree
CN110414601A (en) * 2019-07-30 2019-11-05 南京工业大学 Photovoltaic module method for diagnosing faults, system and equipment based on depth convolution confrontation network
CN111582596A (en) * 2020-05-14 2020-08-25 公安部交通管理科学研究所 Pure electric vehicle endurance mileage risk early warning method integrating traffic state information
US20210357772A1 (en) * 2020-05-14 2021-11-18 Financial Industry Regulatory Authority, Inc. System and method for time series pattern recognition
CN114462509A (en) * 2022-01-12 2022-05-10 重庆邮电大学 Distributed Internet of things equipment anomaly detection method
CN114820074A (en) * 2022-05-16 2022-07-29 郑州简信软件科技有限公司 Target user group prediction model construction method based on machine learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHIBO LU ETC: "DA-DCGAN: An Effective Methodology for DC Series Arc Fault Diagnosis in Photovoltaic Systems", IEEE ACCESS, vol. 7, pages 45831 - 95 *
洪洋等: "深度卷积对抗生成网络综述", 《第18届中国系统仿真技术及其应用学术年会论文集(18TH CCSSTA 2017)》, pages 279 - 283 *
赵维;: "基于生成对抗网络的异常行为模拟算法研究", 长春理工大学学报(自然科学版), no. 06, pages 138 - 142 *

Similar Documents

Publication Publication Date Title
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
WO2021189730A1 (en) Method, apparatus and device for detecting abnormal dense subgraph, and storage medium
US20200097709A1 (en) Classification model training method, server, and storage medium
Wang et al. Efficient learning by directed acyclic graph for resource constrained prediction
CN110636445B (en) WIFI-based indoor positioning method, device, equipment and medium
CN111008337B (en) Deep attention rumor identification method and device based on ternary characteristics
CN108629358B (en) Object class prediction method and device
US11216512B2 (en) Accessible machine learning backends
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN116662817B (en) Asset identification method and system of Internet of things equipment
CN113807073B (en) Text content anomaly detection method, device and storage medium
CN114553591A (en) Training method of random forest model, abnormal flow detection method and device
CN117156442B (en) Cloud data security protection method and system based on 5G network
CN111062490B (en) Method and device for processing and identifying network data containing private data
CN116561338A (en) Industrial knowledge graph generation method, device, equipment and storage medium
CN114448661B (en) Method for detecting slow denial of service attack and related equipment
CN115348190A (en) Internet of things equipment detection method, system and equipment
CN108133234B (en) Sparse subset selection algorithm-based community detection method, device and equipment
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN111159397B (en) Text classification method and device and server
CN113704566A (en) Identification number body identification method, storage medium and electronic equipment
CN112367325A (en) Unknown protocol message clustering method and system based on closed frequent item mining
CN113378881B (en) Instruction set identification method and device based on information entropy gain SVM model
CN113806452B (en) Information processing method, information processing device, electronic equipment and storage medium
CN114584350B (en) Manifold-based attack identification method for dimension reduction and clustering of network data packet characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination