CN107241358B

CN107241358B - Smart home intrusion detection method based on deep learning

Info

Publication number: CN107241358B
Application number: CN201710651758.5A
Authority: CN
Inventors: 胡向东; 杨柳; 胡蓉; 程占喻; 唐贤伦; 白银; 邢有权; 李秋实
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2017-08-02
Filing date: 2017-08-02
Publication date: 2020-04-07
Anticipated expiration: 2037-08-02
Also published as: CN107241358A

Abstract

The invention discloses an intelligent household intrusion detection method based on deep learning, and relates to an online system, in particular to a method for judging whether intrusion behaviors exist in a network by combining a fuzzy neural network and the deep learning. The method organically combines deep learning and the fuzzy neural network together, and solves the problems that the existing intelligent household intrusion detection technology is difficult to process a large amount of high-dimensional data, high in false alarm rate, high in missing report rate and low in detection rate. The invention adopts the off-line system to determine the operation parameters of the on-line system, and the on-line system carries out real-time intrusion detection.

Description

Smart home intrusion detection method based on deep learning

Technical Field

The invention relates to the field of intelligent home security, in particular to a multilayer neural network intrusion behavior detection method based on deep learning.

Background

With the rapid development of the internet of things technology, internet of things products such as smart homes are gradually popularized, however, the safety protection capability of the existing intelligent equipment is generally weak, and the problems that the upgrading and maintenance mechanism is not sound, the safety configuration of the intelligent equipment is unreasonable and the like cause more potential safety hazards to the intelligent equipment. With the recent country proposing and implementing an internet plus action plan, a Chinese manufacturing 2025 plan, smart city construction and the like, a large number of intelligent devices are continuously emerging, but corresponding security guarantee measures are not sound enough, smart homes are used as a new internet of things and are moving to more and more families, a smart home system comprises a camera, a router, a gateway and other intelligent devices, the devices have information security vulnerabilities such as right-bypassing, denial of service and information disclosure, and attackers can easily use the vulnerabilities to attack smart home networks, so that the problems of privacy disclosure of users, abnormal use of smart home networks and the like are caused.

The existing intelligent home systems are not perfect in safety, most of the intelligent home systems adopt technologies such as firewall, authentication or encryption to improve the safety, the technologies belong to passive defense, the attack effect is better for certain specific attacks, and attack behaviors cannot be actively discovered and treatment or preventive measures cannot be taken in time.

Disclosure of Invention

In view of the above, the present invention provides an intelligent home-oriented intrusion detection method with low false alarm rate, high detection rate and high detection rate.

The invention aims to realize the technical scheme that an intelligent household intrusion detection method based on deep learning specifically comprises the following steps:

s1 is initialized, an off-line system database with empty content is generated, and the database comprises three sub-databases of training test data with labels, data screening link parameters and multi-layer network parameters based on deep learning;

s2, encoding and normalizing the collected flow data with the label to form data to be detected, and storing the data to be detected into a training test data sub-database with the label;

s3, classifying the data in the training test data sub-database with labels according to the label of each piece of data to form a normal behavior sample data set and an intrusion behavior sample data set; solving the central value of two types of sample data sets by adopting a K-means algorithm, analyzing the distance between each sample in the two types of sample data sets and the sample center, setting a judgment threshold value, enabling the sample data sets with certain characteristics to be within the threshold value range, and storing the sample center and the threshold value into a data screening link parameter sub-database; training the weight and offset value of the multilayer neural network by adopting data in a training test data sub-database with labels, storing the trained neural network parameters into a deep learning-based multilayer neural network parameter sub-database, completing a training link, and jumping to the step S4 to perform online system real-time monitoring;

s4, coding and normalizing the acquired unlabeled flow data to form a piece of data to be detected, calculating the distance from the data to be detected to the centers of the two types of sample data sets in the step S3, if the distance is smaller than a threshold value corresponding to the sample data set, belonging to the type of behavior, otherwise, skipping to the step S5;

and S5, inputting the data to be detected, the types of which cannot be determined in the step S4, into the multilayer neural network for identification, judging whether potential safety hazards exist or not according to output values of the multilayer neural network, and driving the intelligent home alarm module to alarm if the potential safety hazards exist.

Further, in step S3, a K-means algorithm is used to determine sample centers of two types of behaviors in the labeled training database in the offline system database, and calculate the euclidean distance from the midpoint in the offline system database to the sample center, and a distance threshold of the data screening link is determined by using the rayda criterion for the distance.

Further, the step of inputting the data to be detected, the type of which cannot be determined in the step S4, into the multi-layer neural network for identification includes performing data reduction and fuzzy neural network identification by using a deep belief network.

Further, the multilayer neural network comprises a deep confidence network and a fuzzy neural network, the output of the deep confidence network is used as the input of the fuzzy neural network, and the deep neural network is composed of a plurality of limiting boltzmann machines.

Further, in step S2, in training the weight and the bias values of the multi-layer neural network by using the data in the labeled training test data sub-database, the training of the multi-layer neural network includes training of a deep confidence network and training of a fuzzy neural network.

Further, the training of the deep confidence network comprises unsupervised training from bottom to top and supervised parameter fine tuning from top to bottom; and (3) training the fuzzy neural network by adopting a gradient descent method.

Furthermore, an evaluation model is constructed through the reconstruction error of the limiting Boltzmann machines in the deep confidence network, the detection rate and the detection time of the multilayer neural network, and the like, so that the depth of the multilayer neural network is determined, namely the number of the limiting Boltzmann machines in the deep confidence network is determined.

Further, when the reconstruction error is larger than 0.1, the network depth is increased by 1, namely, a limit boltzmann machine is added in the depth confidence network.

Further, if the reconstruction error of the depth confidence network is less than 0.1, the network depth of the intrusion detection model is determined by evaluating the detection rate and the detection time of the multilayer neural network and selecting the number of the proper limit Boltzmann machines by combining the computing power of the intelligent home system server.

Due to the adoption of the technical scheme, the invention has the following advantages:

the invention adopts the combination of an online detection system and an offline system, thereby overcoming the problems of slow speed and larger hysteresis of the traditional detection method; compared with the traditional intrusion detection method, the multi-layer neural network combining deep learning and the fuzzy neural network is introduced, some unknown intrusion behaviors can be detected, and some false alarm behaviors caused by human misoperation are eliminated, namely the scheme has the advantages of low false alarm rate and high detection rate.

Drawings

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of intrusion detection according to the present invention;

FIG. 2 is a diagram of a smart home system according to the present invention;

FIG. 3 is a schematic diagram of a multi-layer neural network training method according to the present invention;

FIG. 4 is a block diagram of a limiting Boltzmann machine according to the present invention;

fig. 5 is a structural view of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings; it should be understood that the preferred embodiments are illustrative of the invention only and are not limiting upon the scope of the invention.

Referring to a detection flow chart shown in fig. 1, the intelligent home intrusion detection method includes the following steps:

101. initializing, and generating an off-line system database with empty contents, wherein the database comprises three sub-databases of training and testing data with labels, data screening link parameters and multilayer network parameters based on deep learning;

102. the intelligent home system comprises sensor nodes, routing nodes, a server, a client and the like, the intelligent home is composed as shown in fig. 2, flow capturing software is adopted to capture flow data with labels of a gateway of the intelligent home server, the collected data is encoded and normalized to form data to be detected, the data to be detected is added into a training test database with labels in an offline system database in 101, and the step 103 is skipped.

103. 101, classifying the data with labels in the off-line system database according to the label of each piece of data to form a normal behavior sample data set and an intrusion behavior sample data set. Solving the central value of two types of sample data sets by adopting a K-means algorithm, analyzing the distance between each sample and the sample center, and setting a judgment threshold value, so that the sample data sets with certain characteristics are within the threshold value range of the type, and storing the sample center and the threshold value into a data screening link parameter sub-database; training the weight and the offset value of the multilayer neural network by adopting data in the training test data sub-database with the label, storing the trained neural network parameters into the multilayer neural network parameter sub-database based on deep learning, completing the training link, and skipping to the step 104 to perform online system real-time monitoring.

104. Adopting flow packet capturing software to capture the non-label flow data of the intelligent home server gateway, coding and normalizing the captured data to form a piece of data to be detected, finishing a data screening link, namely calculating the distance from the data to be detected to the center of two types of sample data sets in the 103, determining the type of behavior if the distance is less than the threshold value of the type of behavior, otherwise, skipping to the step 105.

105. Inputting the data to be detected, the type of which cannot be determined in the step 104, into a multilayer neural network for recognition, wherein the recognition comprises data reduction and fuzzy neural network recognition by adopting a deep belief network. If the output of the multilayer neural network is [0,1.2], the data is the safety data; if the multi-layer network output is (1.2, 2.5), the data is indicated to have potential safety hazards, an alarm module can be driven to give an alarm, if the output data is not in the interval of [0,2.5], the data cannot be identified by the network, the data is stored in an offline system, and a manager waits for checking and judging whether the data has the potential safety hazards.

In the data screening process in step 103,

a, determining parameters required by a data screening link in an off-line system. The sample centers of two types of behaviors in a training database with labels in an offline system are determined by adopting a K-means algorithm, the algorithm is a clustering algorithm, the clustering center is obtained from the labeled data, and the Euclidean distance from the center of an offline rule base to the sample center is calculated

Wherein (X)₁，…，X_k) Is the sample center, (x)₁，…，x_k) The distance threshold value of a data screening link is determined by adopting a Lauda criterion, also called a 3 sigma criterion, for the distance.

And B, in a data screening link in the online detection system, calculating the distance between the data to be detected and two types of sample centers in the offline system, selecting a smaller distance and comparing the smaller distance with a threshold value set by the offline system for the type of the sample center corresponding to the distance, if the smaller distance is smaller than the threshold value, the type of the sample center belongs to, and if the smaller distance is larger than the threshold value, the type of the sample center does not belong to, and then subsequent multilayer neural network detection is performed.

The multilayer neural network algorithm in step 105 comprises:

a, training parameters of a multi-layer neural network in an off-line system, wherein the training method is shown in FIG. 3, and comprises training of a deep confidence network and training of a fuzzy neural network. The training of the deep confidence network comprises unsupervised training from bottom to top and supervised parameter fine tuning from top to bottom, and a gradient descent method is adopted for the training of the fuzzy neural network.

And B, the deep confidence network is composed of a plurality of limited Boltzmann machines, the model is a probability-based energy model, the principle of the model is as shown in figure 4, the probability of the hidden layer V is obtained through approximation by a Gibbs sampling algorithm, and the weight W and hidden layer apparent layer bias values a and B of the deep confidence network are obtained by a method of derivation of logarithm of likelihood function to parameters. And taking the output of the deep confidence network as the input of the fuzzy neural network to classify the behaviors and outputting a number between [0,2.5 ]. And judging the class of the digital television according to the output number.

And C, constructing an evaluation model through the reconstruction error of the limiting Boltzmann machines in the deep confidence network, the detection rate and the detection time of the multilayer neural network and the like to determine the depth of the multilayer neural network, namely determining the number of the limiting Boltzmann machines in the deep confidence network. And when the reconstruction error is larger than 0.1, the network depth is increased by 1, namely, a limit Boltzmann machine is added in the depth confidence network, and if the reconstruction error of the depth confidence network is smaller than 0.1, the network depth of the intrusion detection model is determined by evaluating the detection rate and the detection time of the multilayer neural network and selecting the number of the proper limit Boltzmann machines by combining the factors such as the computing capacity of the intelligent home system server.

The invention is suitable for the intrusion detection of the intelligent home network, and by using the intrusion detection method disclosed by the invention, because the deep learning and the fuzzy neural network are organically combined, the effects of low false alarm rate, low false detection rate and high detection rate can be achieved, the method also has better detection capability for unknown intrusion behaviors and better self-adaptive capability.

In the traditional method, the false alarm rate is generally higher, the method can reduce the false alarm rate to be less than 5 percent, and simultaneously, the detection rate can reach more than 95 percent. Meanwhile, the detection rate of the unknown new intrusion behavior is over 60 percent.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and it is apparent that those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A smart home intrusion detection method based on deep learning is characterized by comprising the following steps: the method specifically comprises the following steps:

2. The smart home intrusion detection method based on deep learning according to claim 1, characterized in that: in step S3, a K-means algorithm is used to determine sample centers of two types of behaviors in the labeled training database in the offline system database, and the euclidean distance between the point in the offline system database and the sample center is calculated, and the distance is determined by using the rayde criterion to determine the distance threshold of the data screening link.

3. The smart home intrusion detection method based on deep learning according to claim 1, characterized in that: inputting the data to be detected, the type of which cannot be determined in the step S4, into the multilayer neural network for identification includes performing data reduction and fuzzy neural network identification by using a deep belief network.

4. The smart home intrusion detection method based on deep learning according to claim 1, characterized in that: the multi-layer neural network comprises a deep confidence network and a fuzzy neural network, wherein the output of the deep confidence network is used as the input of the fuzzy neural network, and the deep neural network consists of a plurality of limiting Boltzmann machines.

5. The smart home intrusion detection method based on deep learning according to claim 1, characterized in that: in step S2, in training the weight and bias values of the multi-layer neural network using the data in the tagged training test database, the training of the multi-layer neural network includes training of a deep belief network and training of a fuzzy neural network.

6. The smart home intrusion detection method based on deep learning according to claim 5, characterized in that: training the deep confidence network comprises unsupervised training from bottom to top and supervised parameter fine tuning from top to bottom; and (3) training the fuzzy neural network by adopting a gradient descent method.

7. The smart home intrusion detection method based on deep learning according to claim 3, characterized in that: and constructing an evaluation model to determine the depth of the multilayer neural network through the reconstruction error of the limiting Boltzmann machines in the depth confidence network, the detection rate and the detection time of the multilayer neural network, namely determining the number of the limiting Boltzmann machines in the depth confidence network.

8. The smart home intrusion detection method based on deep learning according to claim 7, characterized in that: when the reconstruction error is larger than 0.1, the network depth is increased by 1, namely, a limit boltzmann machine is added in the depth confidence network.

9. The smart home intrusion detection method based on deep learning according to claim 7, characterized in that: and if the reconstruction error of the depth confidence network is less than 0.1, selecting the number of the proper limit Boltzmann machines by evaluating the detection rate and the detection time of the multilayer neural network and combining the computing power of the intelligent home system server to determine the network depth of the intrusion detection model.