CN114172706A - Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box - Google Patents

Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box Download PDF

Info

Publication number
CN114172706A
CN114172706A CN202111432544.1A CN202111432544A CN114172706A CN 114172706 A CN114172706 A CN 114172706A CN 202111432544 A CN202111432544 A CN 202111432544A CN 114172706 A CN114172706 A CN 114172706A
Authority
CN
China
Prior art keywords
sound box
data
network
hurst
intelligent sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111432544.1A
Other languages
Chinese (zh)
Inventor
王宇
韦国成
薛含笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202111432544.1A priority Critical patent/CN114172706A/en
Publication of CN114172706A publication Critical patent/CN114172706A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a system for detecting network flow abnormity of an intelligent sound box, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting network flow data of the intelligent sound box; preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set; performing Hurst index estimation on the characteristics in the characteristic data set, and selecting abnormal detection characteristics according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm; and carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics. The method utilizes methods such as a rescaling range analysis method, a variance time method, an iterative estimation algorithm and the like to carry out Hurst index estimation on the selected characteristics, judges the safety condition of the intelligent sound box network according to the value range of the Hurst value, and ensures the reliability of the detection method.

Description

Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box
Technical Field
The invention belongs to the technical field of intelligent sound boxes, and particularly relates to a method and a system for detecting network flow abnormity of an intelligent sound box, electronic equipment and a storage medium.
Background
The voice is the most common communication mode for people, and is also an important entrance for man-machine communication, and the intelligent sound box plays an intermediary role in communication between people and intelligent home equipment, and is equivalent to a gateway in an intelligent home. Through interacting with intelligent audio amplifier, people can convenient and fast with other intelligent household equipment UNICOMs, for example broadcast music, open/close light and open air purifier etc. as shown in fig. 1. Because frequent interaction between the intelligent sound box and people can store and upload a large amount of user data and privacy habits, the data security and privacy of the intelligent sound box are particularly important in the intelligent home system. In the current smart home system, the smart sound box mainly interfaces communication between the user and the device through the internet, which contains many private data related to the user, and the network protocol often has some security problems, and once attacked, the network protocol often causes immeasurable harm to the user privacy. Therefore, the communication safety problem of the smart home system is worthy of attention and is also the main research direction of researchers. The abnormal flow of the intelligent sound box is a research subject with practical significance, and particularly in the society with increasingly enhanced privacy awareness nowadays.
The current method for detecting network traffic anomaly mainly comprises anomaly detection based on characteristics, anomaly detection based on mathematical statistics and anomaly detection based on data mining. The anomaly detection based on the characteristics mainly finds and detects the anomalies by searching the anomaly characteristics in the network flow and matching the anomaly behaviors in the network flow with the anomaly characteristics, and is an anomaly detection method which is widely applied, but can not detect the anomalies of unknown types, and can only detect the anomalies according to the predefined characteristic matching library, so that the matching library needs to be updated continuously. Anomaly detection based on mathematical statistics uses a statistical analysis method to summarize abnormal traffic data to perform anomaly detection, which is generally divided into space-based anomaly detection and time-based anomaly detection, and the space-based anomaly detection generally detects and analyzes abnormal traffic of a multilink network in a global view, wherein a subspace method is the mainstream of comparison; time-based anomaly detection typically detects and analyzes a flow time series, such as filter analysis techniques; anomaly detection based on mathematical statistics can only detect whether there is an abnormal behavior in the network, but cannot determine the cause of the occurrence of the anomaly and the location of the occurrence of the anomaly. The data mining-based anomaly detection usually uses a data mining technology, and performs anomaly detection aiming at huge-scale data streams, wherein the commonly used data mining technology comprises classification, clustering, association rules and the like, and commonly used algorithms comprise a genetic algorithm, a neural network algorithm, an induction rule algorithm and the like; the method detects the abnormality by tracking the data packet, classifying the characteristics of the data by means of classification aggregation and the like, and analyzing the behavior pattern of the abnormal flow, not only judging whether the abnormality occurs according to the data flow, but also searching the position where the abnormality occurs.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a method, a system, electronic equipment and a storage medium for detecting network traffic abnormality of an intelligent sound box, wherein the method is a network traffic abnormality detection method based on self-similarity index, by collecting network traffic in an attack stage, Hurst index estimation is carried out on selected characteristics by methods such as a rescaling range analysis method, a variance time method, an iterative estimation algorithm and the like, and the safety condition of the intelligent sound box network is judged according to the value range of Hurst value. When the network is attacked by ARP, the Hurst index is greatly deviated from a normal Hurst value and exceeds a Hurst value interval (0, 1) of self-similarity, and the second-order self-similarity of the network disappears at the moment, so that the abnormality occurs in the network, and the effectiveness and the feasibility of the method are proved.
The invention aims to provide a method for detecting network flow abnormity of an intelligent sound box.
The second purpose of the invention is to provide a system for detecting the abnormal network flow of the intelligent sound box.
A third object of the present invention is to provide an electronic apparatus.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a method for detecting abnormal network flow of a smart sound box comprises the following steps:
collecting network flow data of the intelligent sound box;
preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set;
performing Hurst index estimation on the characteristics in the characteristic data set, and selecting abnormal detection characteristics according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics.
Further, performing Hurst index estimation on the features in the feature data set, and selecting features of abnormal detection according to the Hurst value specifically include:
eliminating the most valued features in the feature data set to obtain an eliminated feature data set;
respectively adopting the rescaling range analysis method, the variance time method and the iterative estimation algorithm to carry out Hurst index estimation on the features in the removed feature data set so as to respectively obtain corresponding Hurst values;
if any one feature in the removed feature data set exceeds a set range in Hurst value obtained by adopting the rescaling range analysis method, the variance time method and the iterative estimation algorithm, the feature data set is removed from the removed feature data set;
and simultaneously, removing the features of which the data stream duration is single-point attribute and which do not have continuity from the removed feature data set, wherein the obtained features in the removed feature data set are the features of the abnormal detection.
Further, the set range is (0, 1).
Further, the network flow data of the intelligent sound box is stored as a PCAP file, wherein each line of data represents a network data packet, and the network data packet includes a source port number, a destination port number, a source IP address, a destination IP address, a Unix timestamp, a packet load size, and protocol type information.
Further, the preprocessing comprises flow aggregation and feature processing, wherein:
the flow aggregation is performed, the PCAP file is read, and quintuple information of a single data packet is obtained; performing stream cutting according to SYN and FIN zone bits in a transmission protocol by taking the quintuple information as a basis, thereby saving complete network stream and obtaining bidirectional stream;
the characteristic processing is to convert the data in the bidirectional flow into a characteristic vector format and then divide the bidirectional flow into unidirectional flows in two directions; and respectively extracting the characteristics of the data in the bidirectional flow and the data in the unidirectional flow according to the statistical characteristics of the length, the load and the time stamp of the flow data packet, and collecting the characteristic data.
Further, according to the characteristics of abnormal detection, abnormal detection is carried out to the intelligent audio amplifier network flow that awaits measuring, specifically includes:
preprocessing network flow data of the intelligent sound box to be detected to obtain a characteristic data set;
and performing Hurst index estimation on the characteristics of the abnormal detection in the characteristic data set, and determining the abnormality of the network flow of the intelligent sound box to be detected according to the Hurst value.
Furthermore, the intelligent sound box network flow data is collected, the flow of the switch is mirrored to the flow collection host, and the collection program Wireshark is operated on the collection host, so that the intelligent sound box network flow data is collected.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a detection system for network traffic anomaly of a smart sound box, the system comprising:
the data acquisition module is used for acquiring network flow data of the intelligent sound box;
the data preprocessing module is used for preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set;
the anomaly detection feature selection module is used for carrying out Hurst index estimation on the features in the feature data set and selecting the features of anomaly detection according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and the anomaly detection module is used for carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics.
The third purpose of the invention can be achieved by adopting the following technical scheme:
an electronic device comprises a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to realize the detection method.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program which, when executed by a processor, implements the detection method described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention adopts a rescaling range analysis method (R/S), a variance time method (V-T) and an iterative estimation algorithm to carry out Hurst index estimation on the selected characteristics, thereby ensuring the reliability of the detection method.
2. The Hurst index estimation is carried out on the selected characteristics by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm, and the abnormal condition of the network flow of the intelligent sound box is further judged by judging whether the Hurst index deviates from a normal Hurst value or not and whether the Hurst index exceeds a self-similarity Hurst value interval (0, 1). The test proves the effectiveness and feasibility of the detection method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a schematic view of an intelligent home system according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of a method for detecting network traffic abnormality of a smart sound box according to embodiment 1 of the present invention.
Fig. 3 is a schematic diagram of an interaction process of the smart sound box in embodiment 1 of the present invention.
Fig. 4 is a Hurst value corresponding to the characteristics numbered 13 to 18 in the network traffic after three attacks in embodiment 1 of the present invention.
Fig. 5 is a block diagram of a system for detecting network traffic abnormality of an intelligent sound box in embodiment 2 of the present invention.
Fig. 6 is a block diagram of an electronic device according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention. It should be understood that the description of the specific embodiments is intended to be illustrative only and is not intended to be limiting.
Example 1:
as shown in fig. 2, the present embodiment provides a method for detecting network traffic of a smart sound box, including the following steps:
s201, collecting network flow data of the intelligent sound box.
When the intelligent sound box flow is collected, in order to reduce the influence on the normal use of a network and the influence generated by other applications in the collection host, the flow of the switch is mirrored to the flow collection host, and the flow collection is realized by operating a collection program Wireshark on the collection host.
Fig. 3 shows an experimental environment for collecting network traffic of a smart sound box. The voice conversation is converted into an MP3 audio file by calling a voice synthesis interface of the Baidu AI open platform, the script file is used for playing the conversation according to the set time, and the audio playing time and the audio box response time are controlled not to be interrupted.
Carry out analog operation to intelligent audio amplifier through voice command, voice command is discerned through the algorithm to the voice interaction system of intelligent audio amplifier, convert voice signal into digital signal, later with digital signal upload to the high in the clouds server, the high in the clouds server will carry out speech digital coding discernment and semantic understanding, feed back the processing result to the router again, consequently, all network flow that intelligent audio amplifier produced all must pass through the router, and the flow transmission between internet and the router is born as the intermediary to the switch, and then utilize switch port mirror image function to forward the network flow of wireless network card of flowing through to the monitoring host computer again, use Wireshark to carry out flow acquisition. The collected network traffic is saved as a PCAP file.
In this embodiment, a Wireshark tool is used to monitor the wireless network card. And playing an audio file through the script file to simulate a man-machine interaction process, then forwarding the network flow flowing through the wireless network card to the monitoring host by utilizing a port mirroring function of the switch, and performing flow collection by using Wireshark. The collected network traffic is saved as a PCAP file, where each row of data represents a network packet, which contains detailed data, such as: source port number, destination port number, source IP address, destination IP address, Unix timestamp, packet payload size, protocol type, etc.
In the embodiment, the generation process of the interactive traffic of the intelligent sound box is collected within 20 hours, the size of the collected network traffic data is 10G, 12939197 data packets are totally contained, the contained protocol types include HTTPS, TCP, UDP, WebSocket and the like, and the data packets are stored in a PCAP format, so that the traffic analysis is facilitated.
S202, preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set.
The data preprocessing comprises flow aggregation and feature processing, wherein:
flow rate polymerization: in network traffic, data packets with the same five-tuple information { source IP address, destination IP address, source port number, destination port number, transport protocol } are usually divided into the same data stream. And reading the PCAP data set by utilizing a Tschark command line tool of Wireshark, acquiring quintuple information of a single data packet, and cutting the network flow according to the quintuple information. Because the network flow may have data loss due to the network fluctuation problem which may occur in the flow collection process, the flow cutting is performed according to the SYN and FIN flag bits in the transmission protocol, and only the complete network flow is saved. The network flow aggregation process comprises the steps of reading a PCAP file, obtaining a single data packet, judging whether a quintuple of the data packet is in the existing network flow, if so, adding the data packet into the network flow with the same quintuple, then detecting whether the reading of the PCAP file is finished, if the reading is finished, finishing the flow aggregation process, and if the reading is not finished, returning to the stage of obtaining the single data packet; and if the PCAP file is not in the network flow set, creating a new network flow, adding a data packet to the new network flow, detecting whether the reading of the PCAP file is finished, and repeating the process.
(2) Characteristic processing: the core work of the feature processing is to convert PCAP traffic data of original traffic into a feature vector format, in this embodiment, a Scapy tool is used to perform traffic analysis on the PCAP data, and Scapy is a powerful tool for analyzing network traffic data packets and can be expanded for use according to requirements. Extracting network communication fields in the PCAP data by using Scapy, wherein the extracted fields are shown in a table 1:
table 1 packet characteristics field
Figure BDA0003380614770000061
In this embodiment, only the statistical information of the TCP traffic of the transport layer is considered as a characteristic representation, in the traffic aggregation step, the original traffic data is converted into a complete network flow, which is called a bidirectional flow, and the bidirectional flow can be divided into unidirectional flows in two directions according to the transmission direction of the data, where a represents the transmitting end and B represents the receiving end. According to statistical characteristics such as the length, load and timestamp of the traffic data packet, feature extraction is respectively performed on the bidirectional flow and the unidirectional flow, so that 37 network flow feature values are obtained in total, as shown in table 2:
TABLE 2 network flow characteristics
Figure BDA0003380614770000062
And obtaining an intelligent sound box network flow characteristic data set through flow aggregation and characteristic processing. The data set contains 54320 flow characteristic data sets, each data containing 37 characteristic dimensions.
S203, carrying out Hurst index estimation on the characteristics in the characteristic data set, and selecting the characteristics of abnormal detection according to the Hurst value.
Further, step S203 specifically includes:
s2031, performing Hurst index estimation on the features in the feature data set by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm.
The feature selection of the anomaly detection is based on the collected flow feature data set, the data set defines 37 feature attributes formed by the values of the TCP data packets and the statistical features, in the embodiment, only 19 features are selected for feature analysis, the most valued features are eliminated, because the features are obviously fluctuated, abnormal values may be generated in different time periods, different network states or frequent interaction, and the data have great influence on the analysis of the self-similarity, so the data are processed as the non-conventional data.
In the network flow abnormity detection based on the self-similarity, a network data flow can be regarded as a time sequence to carry out self-similarity analysis, and Hurst index estimation is carried out on 19 selected characteristics by using a rescaling range analysis (R/S), a variance-time method (V-T) and an iterative estimation algorithm, wherein:
(1) rescale range analysis (R/S).
For time series X ═ XnN is more than or equal to 1}, which is divided into g mutually non-overlapping sub-intervals with the length of r, and the sub-sequences are as follows:
X11,X12,...,X1r X21,X22,...X2r ... Xg1,Xg2,...Xgr
the mean of each set of subsequences is:
Figure BDA0003380614770000071
the dispersion corresponding to each element of the subsequence is:
Figure BDA0003380614770000072
the cumulative dispersion for each subsequence is calculated as:
Figure BDA0003380614770000073
the range for each set of subsequences was calculated as:
Ri=max(zij)-min(zij)i=1,2,…,g j=1,2,…,r (4)
the standard deviation for each group of subsequences was:
Figure BDA0003380614770000081
the final calculated RS values are:
Figure BDA0003380614770000082
in the case of a self-similar sequence,
Figure BDA0003380614770000083
and g is in a linear relation with the logarithmic coordinate, and the slope of the image is expressed as the Hurst value.
The R/S analysis method evaluates the self-similarity degree of the time sequence, has good stability, is clear and easy to realize in algorithm realization, is generally used for analyzing the distribution change of the time sequence, and is commonly used for a sample data set with large enough data volume.
(2) Time of variance method (V-T).
The Time Variance Method is also called V-T analysis (Variance Time Method), which is a kind of cluster analysis Method for analyzing Time Variance of different scale aggregation sequences. The method utilizes a stacking process XmThe variance of (c) by analyzing the slow decay variance property of the self-similar sequence is:
Var(Xm)~am,0≤β≤1 (7)
where a is a positive integer independent of m.
Setting the time sequence as XiI is 1,2, …, N, where N is the time series length, which is divided into N subsequences of equal length m (m is a positive integer), where X is the subsequencem(k) K is 1,2, …, N, and N is N/m, and the aggregate value X for each subsequence is calculatedm(k) Comprises the following steps:
Figure BDA0003380614770000084
calculating the variance VarX of the subsequences' aggregate valuesmComprises the following steps:
Figure BDA0003380614770000085
taking logarithm of two sides of formula (7) to obtain:
log(Var(Xm))~-βlog(m)+log(a),m→∞ (10)
the scatter coordinates in the logarithmic coordinate system are then: p (logm, log (Var (X))m))). Drawing a curve according to the scattered point coordinates, and then making a least square line straight line according to least square normal linear fitting, wherein the relation between the slope beta and the Hurst index satisfies the following conditions:
Figure BDA0003380614770000091
the Hurst index estimate is obtained by equation (11).
The variance time method has a certain limit on the data set, and can identify abnormal data when the data reading unit or the reading interval is short.
(3) And (4) iterative estimation algorithm.
Setting network flow sequence XtIs self-similar, wherein XtRepresenting the network traffic (typically number of bytes, number of packets, etc.) during the ith time period, the autocorrelation function ρkIf equation (12) should be satisfied, then transform the equation to obtain Hurst's iterative calculation equation:
ρk=H(2H-1)k2H-2 (12)
Figure BDA0003380614770000092
where H is referred to as the Hurst index or self-similarity index, where H.epsilon. (0.5, 1). The larger the H value, the higher the degree of self-similarity. Due to sigmakρk∞ so called long correlation, which means that the larger k, the greater the correlation of sequences still exists. Therefore, the correct and fast calculation of the Hurst index has very important significance for researching the self-similarity characteristics of the network and the change of the network.
For a given number n of sequences X1,X2,…,XnLet us order
Figure BDA0003380614770000093
Figure BDA0003380614770000094
Figure BDA0003380614770000095
Figure BDA0003380614770000096
And
Figure BDA0003380614770000097
respectively, sample mean, sample covariance, and sample autocorrelation functions. Using sample autocorrelation functions
Figure BDA0003380614770000098
Instead of rhokThere is an iterative estimation formula for Hurst index:
Figure BDA0003380614770000099
setting initial value for a long correlation process
Figure BDA0003380614770000101
However, when k is 1, not only a Hurst estimated value with sufficient accuracy can be obtained, but also the amount of calculation can be greatly reduced. Thus, taking k equal to 1 in equation (17) results in a simplified iterative estimation equation:
Figure BDA0003380614770000102
the uniqueness and convergence of the formula in H epsilon (0.5,1) can be proved through fixed point positioning.
The iterative estimation algorithm can obtain higher speed and precision by only acquiring a small amount of data, and is more suitable for real-time monitoring.
S2032, selecting the characteristics for anomaly detection according to the Hurst values respectively calculated by a rescaling range analysis method, a variance time method and an iterative estimation algorithm.
Table 3 shows the results of estimation of 19 characteristic attributes and Hurst values of the attributes in the dataset.
Table 319 feature attributes and attribute Hurst value estimates
Figure BDA0003380614770000103
As can be seen from table 3, the 12 non-time-dependent characteristic attributes with numbers 1 to 12 (packet length and payload) are subjected to Hurst index estimation, and their Hurst values are all greater than 1, which indicates that these non-time-dependent characteristic attributes do not provide statistical characteristic contribution to the self-similarity of the data stream, and therefore should be removed. The Hurst values of the 7 time-dependent characteristic attributes with numbers 13-19 (packet arrival time and data stream duration) are in the interval (0, 1), while the characteristic attribute with number 19, namely the data stream duration, is a single-point attribute, has no continuity, has poor stability in different interactive responses, and should be removed if the Hurst value difference is too large. And finally, selecting the serial numbers 13-18, namely the six characteristic attributes which are rich in statistical information and have time-related attributes as the characteristic attributes used for anomaly detection.
And S204, carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the characteristics of the anomaly detection.
And (4) carrying out abnormity detection on the network flow of the intelligent sound box by simulating ARP attack.
The attack mode adopted in the embodiment is ARP spoofing attack. The ARP spoofing is an attack means for controlling traffic or acquiring encrypted traffic by making an attacker misunderstand that the MAC address spoofed by the attacker is the MAC address of a gateway or a host needing to be accessed, so that the network is attacked. The purpose of ARP spoofing is not to directly cause the network to be unable to communicate, but to steal the communication traffic between the target and the gateway or the host by this means, and to obtain the privacy data.
The specific simulation mode is as follows: and adding an attack host in the target network, forging the ARP request data packet by using Scapy on the attack host, and continuously transmitting the packet to the intelligent sound box to deceive that the target is a network card, so that the data of the intelligent sound box passes through the attack host to realize the purpose that the intelligent sound box can not surf the internet or monitor the data.
And (3) solving the Hurst values of 6 characteristic attributes numbered 13-18 in the network flow after three attacks by respectively adopting an R/S analysis method, a variance time method and an iterative estimation algorithm, wherein the Hurst values are respectively shown in figure 4.
Since network traffic has second order self-similarity, and this random process of network traffic has long correlation, i.e., there is Hurst exponent, and 0.5< H < 1. When the Hurst index fluctuates sharply and rapidly steps away from the confidence interval (0, 1), the second-order similarity of the network flow disappears, which indicates that sudden flow is generated in the network and the network flow of the intelligent sound box is abnormal. As can be seen from fig. 4, when the Hurst index is abnormal and clearly appears, the network is attacked by ARP, which indicates that the flow in the network is in an abnormal state.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 5, this embodiment provides a classification system for network traffic of an intelligent sound box, where the system includes a data acquisition module 501, a data preprocessing module 502, an anomaly detection feature selection module 503, and an anomaly detection module 504, where:
the data acquisition module 501 is used for acquiring network traffic data of the intelligent sound box;
a data preprocessing module 502, configured to preprocess the network traffic data of the smart sound box to obtain a feature data set;
an anomaly detection feature selection module 503, configured to perform Hurst index estimation on features in the feature data set, and select an anomaly detection feature according to a Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and an anomaly detection module 504, configured to perform anomaly detection on the network traffic of the to-be-detected smart sound box according to the anomaly detection characteristics.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
the present embodiment provides an electronic device, which may be a computer, as shown in fig. 6, and includes a processor 602, a memory, an input device 603, a display 604, and a network interface 605 connected by a system bus 601, where the processor is used to provide computing and control capabilities, the memory includes a nonvolatile storage medium 606 and an internal memory 607, the nonvolatile storage medium 606 stores an operating system, a computer program, and a database, the internal memory 607 provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor 602 executes the computer program stored in the memory, the detection method of embodiment 1 is implemented as follows:
collecting network flow data of the intelligent sound box;
preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set;
performing Hurst index estimation on the characteristics in the characteristic data set, and selecting abnormal detection characteristics according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the detection method of the above embodiment 1 is implemented as follows:
collecting network flow data of the intelligent sound box;
preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set;
performing Hurst index estimation on the characteristics in the characteristic data set, and selecting abnormal detection characteristics according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In conclusion, the invention provides the intelligent sound box network flow abnormity detection based on the self-similarity index Hurst, the unstable most value attribute in the network flow characteristic data set is removed through preliminary judgment, the Hurst value estimation is carried out on the residual characteristic attributes by using three estimation algorithms, the non-time correlation characteristic attribute which does not contribute to the self-similarity of the data stream and the unstable time correlation attribute are further removed, the abnormity detection characteristic is finally obtained, and meanwhile, the characteristic with time sequence property is proved to have the self-similarity and to have statistical contribution to the self-similarity of the network data stream. And carrying out Hurst index estimation on the selected abnormal detection characteristics by acquiring the network flow in the attack stage by using 3 methods of an R/S analysis method, a variance time method and an iterative estimation algorithm, and then judging the safety condition of the intelligent sound box network according to the value range of the Hurst value. Experiments prove that when the network is attacked by ARP, the Hurst index greatly deviates from a normal Hurst value and exceeds a Hurst value interval (0, 1) of self-similarity, the second-order self-similarity of the network disappears at the moment, the abnormality occurs in the network, and meanwhile, the effectiveness and feasibility of the network flow abnormality detection scheme based on the self-similarity index are also proved.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (10)

1. A method for detecting network flow abnormity of an intelligent sound box is characterized by comprising the following steps:
collecting network flow data of the intelligent sound box;
preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set;
performing Hurst index estimation on the characteristics in the characteristic data set, and selecting abnormal detection characteristics according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics.
2. The method according to claim 1, wherein the performing Hurst index estimation on the features in the feature data set and selecting features of anomaly detection according to the Hurst value specifically includes:
eliminating the most valued features in the feature data set to obtain an eliminated feature data set;
respectively adopting the rescaling range analysis method, the variance time method and the iterative estimation algorithm to carry out Hurst index estimation on the features in the removed feature data set so as to respectively obtain corresponding Hurst values;
if any one feature in the removed feature data set exceeds a set range in Hurst value obtained by adopting the rescaling range analysis method, the variance time method and the iterative estimation algorithm, the feature data set is removed from the removed feature data set;
and simultaneously, removing the features of which the data stream duration is single-point attribute and which do not have continuity from the removed feature data set, wherein the obtained features in the removed feature data set are the features of the abnormal detection.
3. The classification method according to claim 2, wherein the set range is (0, 1).
4. The classification method according to claim 1, wherein the smart sound network traffic data is saved as a PCAP file, wherein each row of data represents a network packet, and the network packet includes a source port number, a destination port number, a source IP address, a destination IP address, a Unix timestamp, a packet payload size, and protocol type information.
5. The detection method according to claim 4, wherein the preprocessing comprises flow aggregation and feature processing, wherein:
the flow aggregation is performed, the PCAP file is read, and quintuple information of a single data packet is obtained; performing stream cutting according to SYN and FIN zone bits in a transmission protocol by taking the quintuple information as a basis, thereby saving complete network stream and obtaining bidirectional stream;
the characteristic processing is to convert the data in the bidirectional flow into a characteristic vector format and then divide the bidirectional flow into unidirectional flows in two directions; and respectively extracting the characteristics of the data in the bidirectional flow and the data in the unidirectional flow according to the statistical characteristics of the length, the load and the time stamp of the flow data packet, and collecting the characteristic data.
6. The detection method according to claim 1, wherein the performing anomaly detection on the network traffic of the smart sound box to be detected according to the characteristics of the anomaly detection specifically comprises:
preprocessing network flow data of the intelligent sound box to be detected to obtain a characteristic data set;
and performing Hurst index estimation on the characteristics of the abnormal detection in the characteristic data set, and determining the abnormality of the network flow of the intelligent sound box to be detected according to the Hurst value.
7. The detection method according to any one of claims 1 to 6, wherein the collecting of the network traffic data of the smart sound box is realized by mirroring the traffic of the switch to a traffic collection host and running a collection program Wireshark on the collection host.
8. A classification system for network traffic of smart speakers, the system comprising:
the data acquisition module is used for acquiring network flow data of the intelligent sound box;
the data preprocessing module is used for preprocessing the network flow data of the intelligent sound box to obtain a characteristic data set;
the anomaly detection feature selection module is used for carrying out Hurst index estimation on the features in the feature data set and selecting the features of anomaly detection according to the Hurst value; wherein, the Hurst index estimation is respectively carried out by adopting a rescaling range analysis method, a variance time method and an iterative estimation algorithm;
and the anomaly detection module is used for carrying out anomaly detection on the network flow of the intelligent sound box to be detected according to the anomaly detection characteristics.
9. An electronic device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the detection method of any one of claims 1 to 7 when executing the program stored in the memory.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the detection method of any one of claims 1 to 7.
CN202111432544.1A 2021-11-29 2021-11-29 Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box Pending CN114172706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111432544.1A CN114172706A (en) 2021-11-29 2021-11-29 Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111432544.1A CN114172706A (en) 2021-11-29 2021-11-29 Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box

Publications (1)

Publication Number Publication Date
CN114172706A true CN114172706A (en) 2022-03-11

Family

ID=80481455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111432544.1A Pending CN114172706A (en) 2021-11-29 2021-11-29 Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box

Country Status (1)

Country Link
CN (1) CN114172706A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114844760A (en) * 2022-05-05 2022-08-02 鹏城实验室 Network fault sensing and positioning method, device, terminal and storage medium
CN115348337A (en) * 2022-07-11 2022-11-15 广州市玄武无线科技股份有限公司 TCP data packet analysis method and device based on multiple protocols

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895420A (en) * 2010-07-12 2010-11-24 西北工业大学 Rapid detection method for network flow anomaly
CN104796301A (en) * 2015-03-31 2015-07-22 北京奇艺世纪科技有限公司 Network traffic abnormity judgment and device
CN110474883A (en) * 2019-07-24 2019-11-19 哈尔滨工程大学 A kind of SDN anomalous traffic detection method based on rescaled range method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895420A (en) * 2010-07-12 2010-11-24 西北工业大学 Rapid detection method for network flow anomaly
CN104796301A (en) * 2015-03-31 2015-07-22 北京奇艺世纪科技有限公司 Network traffic abnormity judgment and device
CN110474883A (en) * 2019-07-24 2019-11-19 哈尔滨工程大学 A kind of SDN anomalous traffic detection method based on rescaled range method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任俊玲等: "基于自相似指数变化率的网络数据流异常分析" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114844760A (en) * 2022-05-05 2022-08-02 鹏城实验室 Network fault sensing and positioning method, device, terminal and storage medium
CN115348337A (en) * 2022-07-11 2022-11-15 广州市玄武无线科技股份有限公司 TCP data packet analysis method and device based on multiple protocols

Similar Documents

Publication Publication Date Title
CN111935170B (en) Network abnormal flow detection method, device and equipment
CN107483455B (en) Flow-based network node anomaly detection method and system
CN111092852B (en) Network security monitoring method, device, equipment and storage medium based on big data
CN108429651B (en) Flow data detection method and device, electronic equipment and computer readable medium
KR101621019B1 (en) Method for detecting attack suspected anomal event
CN106375339B (en) Attack mode detection method based on event sliding window
CN114172706A (en) Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box
CN109218321A (en) A kind of network inbreak detection method and system
CN111209960B (en) CSI system multipath classification method based on improved random forest algorithm
CN114978877B (en) Abnormality processing method, abnormality processing device, electronic equipment and computer readable medium
CN111835681A (en) Large-scale abnormal flow host detection method and device
CN117216660A (en) Method and device for detecting abnormal points and abnormal clusters based on time sequence network traffic integration
Megantara et al. Feature importance ranking for increasing performance of intrusion detection system
CN115795330A (en) Medical information anomaly detection method and system based on AI algorithm
KR100901696B1 (en) Apparatus of content-based Sampling for Security events and method thereof
CN116915450A (en) Topology pruning optimization method based on multi-step network attack recognition and scene reconstruction
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
CN113901441A (en) User abnormal request detection method, device, equipment and storage medium
CN116132311B (en) Network security situation awareness method based on time sequence
CN110909380B (en) Abnormal file access behavior monitoring method and device
CN111478922B (en) Method, device and equipment for detecting communication of hidden channel
CN114972827A (en) Asset identification method, device, equipment and computer readable storage medium
CN116599743A (en) 4A abnormal detour detection method and device, electronic equipment and storage medium
Martins et al. Automatic detection of computer network traffic anomalies based on eccentricity analysis
CN109257384B (en) Application layer DDoS attack identification method based on access rhythm matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220311