CN111756757A - Botnet detection method and device - Google Patents

Botnet detection method and device Download PDF

Info

Publication number
CN111756757A
CN111756757A CN202010597318.8A CN202010597318A CN111756757A CN 111756757 A CN111756757 A CN 111756757A CN 202010597318 A CN202010597318 A CN 202010597318A CN 111756757 A CN111756757 A CN 111756757A
Authority
CN
China
Prior art keywords
data
characteristic
network
vector
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010597318.8A
Other languages
Chinese (zh)
Inventor
陈霖
杨祎巍
索思亮
匡晓云
许爱东
洪超
黄开天
徐培明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CSG Electric Power Research Institute
Research Institute of Southern Power Grid Co Ltd
Original Assignee
Research Institute of Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Institute of Southern Power Grid Co Ltd filed Critical Research Institute of Southern Power Grid Co Ltd
Priority to CN202010597318.8A priority Critical patent/CN111756757A/en
Publication of CN111756757A publication Critical patent/CN111756757A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a botnet detection method and device, which are used for solving the problem of low accuracy of botnet detection in the prior art. The invention comprises the following steps: acquiring a plurality of data packets of a preset network, and combining the data packets into a data stream; extracting a plurality of feature data of the data stream, and preprocessing each feature data to obtain preprocessed feature data; extracting the preprocessing characteristic data based on a preset long and short memory network to generate a data flow characteristic vector; extracting a sequence relation among the plurality of data packets based on the long and short memory networks, and generating a network flow vector according to the sequence relation; and inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result. The botnet detection method for extracting the feature data based on the long and short memory networks can effectively improve the accuracy of botnet detection.

Description

Botnet detection method and device
Technical Field
The invention relates to the technical field of botnet detection, in particular to a botnet detection method and device.
Background
Botnet (Botnet) is a computer network composed of a group of computers (Botnet hosts) infected with the same malicious software, is controlled by hackers, integrates traditional viruses, trojans and worms into a whole to perform conventional system residence, information stealing and remote control, and has the worm network propagation characteristic. Botnets are often used to launch large-scale network attack destruction activities, almost all distributed denial of service attacks currently come from botnets, 80% to 95% of junk mails are launched by botnets, click fraud, sensitive information stealing, phishing websites, encryption extones and the like mainly utilize botnets to earn economic benefits, and botnets are still the most favored writing tools for hackers on the internet at present. In recent years, botnets utilize advanced ideas and technologies to increase the detection difficulty, and how to accurately identify botnets, especially unknown botnets in a latent period, remains a difficult point and a hot point for research in academic circles and industrial circles.
Disclosure of Invention
The invention provides a botnet detection method and device, which are used for solving the problem of low accuracy of botnet detection in the prior art.
The invention provides a botnet detection method, which comprises the following steps:
acquiring a plurality of data packets of a preset network, and combining the data packets into a data stream;
extracting a plurality of feature data of the data stream, and preprocessing each feature data to obtain preprocessed feature data;
extracting the preprocessing characteristic data of the data stream based on a preset long and short memory network to generate a data stream characteristic vector;
extracting a sequence relation among the plurality of data packets based on the long and short memory network, and generating a network flow vector according to the data flow characteristic vector and the sequence relation;
and inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result.
Optionally, the step of extracting the preprocessing feature data of the data stream based on a preset long-short memory network to generate a data stream feature vector includes:
extracting the preprocessing characteristic data of the data stream based on a preset long and short memory network, and generating a data packet characteristic vector corresponding to each data packet;
acquiring time sequence information of a data packet in the data stream;
and generating a data stream characteristic vector by adopting the data packet characteristic vectors respectively corresponding to the plurality of data packets based on the time sequence information.
Optionally, the step of extracting a sequence relationship between the plurality of data packets based on the long and short memory networks and generating a network flow vector according to the data flow feature vector and the sequence relationship includes:
extracting a sequence relation among the plurality of data packets based on the long and short memory network, determining a last data packet feature vector based on the sequence relation, and generating a network flow vector according to the data flow feature vector and the last data packet feature vector.
Optionally, the step of preprocessing each feature data to obtain preprocessed feature data includes:
and calculating the standard deviation and the average value of the plurality of characteristic data, and calculating to obtain the preprocessed characteristic data corresponding to each characteristic data in the plurality of characteristic data by adopting the standard deviation and the average value.
Optionally, the step of combining the plurality of data packets into a data stream includes:
and acquiring IP information, port information and a protocol number of each data packet, and combining the data packets with the same IP information, port information and protocol number into a data stream.
The invention provides a botnet detection device, which comprises:
the data flow combination module is used for acquiring a plurality of data packets of a preset network and combining the data packets into a data flow;
the preprocessing characteristic data generating module is used for extracting a plurality of characteristic data of the data stream and preprocessing each characteristic data to obtain preprocessing characteristic data;
the data flow characteristic vector generation module is used for extracting the preprocessing characteristic data of the data flow based on a preset long and short memory network and generating a data flow characteristic vector;
the network flow vector generating module is used for extracting the sequence relation among the plurality of data packets based on the long and short memory networks and generating network flow vectors according to the data flow characteristic vectors and the sequence relation;
and the classification module is used for inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result.
Optionally, the data stream feature vector generating module includes:
a data packet feature vector generation submodule, configured to extract the preprocessing feature data of the data stream based on a preset long-short memory network, and generate a data packet feature vector corresponding to each data packet;
the time sequence information acquisition submodule is used for acquiring the time sequence information of the data packet in the data stream;
and the data stream feature vector generation submodule is used for generating the data stream feature vector by adopting the data packet feature vectors respectively corresponding to the plurality of data packets based on the time sequence information.
Optionally, the network flow vector generation module includes:
and the network flow vector generation submodule is used for extracting a sequence relation among the plurality of data packets based on the long and short memory network, determining a tail-bit data packet feature vector based on the sequence relation, and generating a network flow vector according to the data flow feature vector and the tail-bit data packet feature vector.
Optionally, the preprocessing feature data generating module includes:
and the preprocessing characteristic data generation submodule is used for calculating the standard deviation and the average value of the plurality of characteristic data and calculating the preprocessing characteristic data corresponding to each characteristic data in the plurality of characteristic data by adopting the standard deviation and the average value.
Optionally, the data stream combining module comprises:
and the data stream combination sub-module is used for acquiring the IP information, the port information and the protocol number of each data packet and combining the data packets with the same IP information, the same port information and the same protocol number into a data stream.
According to the technical scheme, the invention has the following advantages: the method comprises the steps of acquiring a plurality of data packets of a preset network; combining a plurality of data packets into a data stream; extracting a plurality of characteristic data of the data stream; preprocessing each characteristic data to obtain preprocessed characteristic data; extracting preprocessing characteristic data of each data packet to generate a data packet characteristic vector; generating a data flow characteristic vector by adopting the data packet characteristic vectors corresponding to the plurality of data packets respectively; acquiring a sequence relation among a plurality of data packets, and generating a network flow vector according to the sequence relation; and inputting the network flow vector into a preset classifier to obtain a classification result. To determine whether the network is a botnet based on the classification results. The botnet detection method for extracting the feature data based on the long and short memory networks can effectively improve the accuracy of botnet detection.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flowchart illustrating steps of a botnet detection method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a botnet detection method according to another embodiment of the present invention;
FIG. 3 is an architecture diagram of a botnet detection method according to an embodiment of the present invention;
fig. 4 is a block diagram of a botnet detection device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a botnet detection method and device, which are used for solving the technical problem of low accuracy of botnet detection in the prior art.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a botnet detection method according to an embodiment of the present invention.
The invention provides a botnet detection method, which comprises the following steps:
step 101, acquiring a plurality of data packets of a preset network, and combining the data packets into a data stream;
botnets are computer networks formed by a group of computers (botnet hosts) infected with the same malicious software, are controlled by hackers, integrate traditional viruses, trojans and worms into a whole to perform conventional system residence, information stealing and remote control, and have the worm network propagation characteristic. Botnets are often used to launch large-scale network attack destruction activities, almost all distributed denial of server attacks currently come from botnets, 80% to 95% of spam is launched by botnets, and click fraud, sensitive information stealing, phishing websites, encrypted lassos and the like also mainly utilize botnets to earn economic benefits. Botnets are still the most favored committing tool for hackers on the internet at present.
The botnet detection method provided by the embodiment of the invention is applicable to a modeling task based on the flow characteristics, judges the network type through the flow characteristics, and judges whether the detected network is the botnet or not.
Therefore, in the embodiment of the present invention, a plurality of data packets of a predetermined network may be obtained and combined into a data stream, so as to obtain stream characteristics.
102, extracting a plurality of feature data of the data stream, and preprocessing each feature data to obtain preprocessed feature data;
in an embodiment of the invention, the characteristic data of the data stream may comprise packet-based characteristics; characteristics based on the communication connection; features based on abnormal behavior and features based on flow similarity.
After the feature data of the data stream is extracted, the feature data may be preprocessed to obtain preprocessed feature data.
103, extracting the preprocessing characteristic data based on a preset long and short memory network to generate a data stream characteristic vector;
in the embodiment of the invention, the characteristics can be learned through a long-short memory network (LSTM). the characteristic learning process of the invention is realized through a two-stage LSTM, and in the first stage, the LSTM only performs characteristic learning on each data packet, extracts the preprocessing characteristic data of each data packet and outputs the characteristic vector of each data packet. And generating a data stream feature vector by using the feature vector of each data packet.
The long and short memory network is a time-cycle neural network, and is specially designed for solving the long-term dependence problem of the general RNN (cyclic neural network), and all the RNNs have a chain form of repeated neural network modules.
104, extracting a sequence relation among the plurality of data packets based on the long and short memory networks, and generating a network flow vector according to the data flow characteristic vector and the sequence relation;
in the embodiment of the invention, the second stage of feature learning is to further learn the sequence relationship between the data packets on the basis of the feature vectors of the data stream by using the LSTM, finally obtain the feature vector of the last data packet of the data stream, and form a complete network vector stream.
And 105, inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result.
In the embodiment of the present invention, after the complete network flow vector of the data flow is extracted, the network flow vector may be input into a preset classifier to determine the type of the data flow, so as to determine whether the network generating the data flow is a botnet.
In one example, a SoftMax classifier may be employed to perform the classification task to obtain the final output.
The SoftMax classifier is a fully-connected neural network classifier (after training), network flow vectors are used as input, the output classification result is that each network flow corresponds to a 5-dimensional output vector and corresponds to four botnets and a normal flow respectively, each value is the probability of being classified into the flow after being calculated by SoftMax, the sum of the values of the five dimensions is 1, and the maximum probability value is taken as the classification of the current data flow.
The method comprises the steps of acquiring a plurality of data packets of a preset network; combining a plurality of data packets into a data stream; extracting a plurality of characteristic data of the data stream; preprocessing each characteristic data to obtain preprocessed characteristic data; extracting preprocessing characteristic data of each data packet to generate a data packet characteristic vector; generating a data flow characteristic vector by adopting the data packet characteristic vectors corresponding to the plurality of data packets respectively; acquiring a sequence relation among a plurality of data packets, and generating a network flow vector according to the sequence relation; and inputting the network flow vector into a preset classifier to obtain a classification result. To determine whether the network is a botnet based on the classification results. The botnet detection algorithm based on the long and short memory networks is suitable for a high-dimensional space modeling task based on flow characteristics, and can effectively improve the accuracy of botnet detection.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a botnet detection method according to another embodiment of the present invention.
The invention provides a botnet detection method, which comprises the following steps:
step 201, acquiring a plurality of data packets of a preset network, and combining the plurality of data packets into a data stream;
the botnet detection method provided by the embodiment of the invention is applicable to a modeling task based on the flow characteristics, judges the network type through the flow characteristics, and judges whether the detected network is the botnet or not.
Therefore, in the embodiment of the present invention, a plurality of data packets of a predetermined network may be obtained and combined into a data stream, so as to obtain stream characteristics.
In an embodiment of the present invention, the step of combining the plurality of data packets into a data stream may include: and acquiring the IP information, the port information and the protocol number of each data packet, and combining the data packets with the same IP information, port information and protocol number into a data stream.
Specifically, the embodiment of the present invention may recombine and combine the data packets into a data stream according to the 5-tuple format of < source IP, destination IP, source port, destination port, and protocol number >.
Step 202, extracting a plurality of feature data of the data stream, and preprocessing each feature data to obtain preprocessed feature data;
in an embodiment of the invention, the characteristic data of the data stream may comprise packet-based characteristics; characteristics based on the communication connection; features based on abnormal behavior and features based on flow similarity.
Wherein the packet-based features include:
source IP address and destination IP address: this feature is mainly used to quantify the different network connections.
Protocol (protocol): this feature may effectively detect botnets that have partially used a particular protocol.
Features based on the communication link include:
connection duration (duration): botnets' initial attempted connections tend to be unidirectional and of short duration. However, when the connection is successful, a relatively long communication session is followed.
Network flow first packet length (FPS): botnets tend to send out traffic packets of shorter length when attempting connections, and therefore the first packet of a network flow is typically shorter in length.
Features based on abnormal behavior include:
reconnect (reconnect): botnets often use various strategies to prevent detectors from discovering and analyzing their communication characteristics in an attempt to mask their communication behavior. It has been found that botnets typically randomly connect flows to which a connection has been established, so that they no longer have communication similarities, thereby defeating the flow similarity-based detection method.
Exchanging data packets: to maintain the connection with the control side, zombie hosts typically send a large number of packets for information exchange. Botnet hosts often exchange information using small data packets, such as P2P type botnet hosts that probe peer hosts through small data packets, while centralized botnet hosts communicate with servers using small data packets. The number of switched Packets (PX), the number of switched small packets (NNP), and the Percentage of Switched Packets (PSP) all contribute to identifying the botnet.
Ratio of number of incoming packets to number of outgoing packets (IO-PR): to evaluate the characteristics between the incoming and outgoing of data packets between botnet and normal network traffic.
Features based on stream similarity include:
total number of bytes (TBT), length of average payload packet (APL), ratio of number of packets of same length to total number of packets (DPL), payload packet length standard deviation (PV). Because the botnet program is implanted in advance, the botnet flow is uniform, similar and unified, and the flow generated by normal users is diversified and randomized. Stream statistical characteristics such as total number of bytes (TBT), average payload packet length (APL), ratio of the number of packets of the same length to the total number of packets (DPL), payload packet length standard deviation (PV), etc., can also be used to detect botnets.
In addition, the 4 attribute features of the average number of bits per second (BS), the average number of packets Per Second (PS) in the time window, the average time of arrival of the data packets (PPS), and the average number of packets per second (AIT) can be used to describe the similarity of botnet communication and also serve as characteristic parameters for detecting botnets.
After the feature data of the data stream is extracted, character type features in the feature data can be converted into numerical type feature data by adopting an attribute mapping method; if the attribute characteristic 'proto-col' has 107 values: "tcp", "udp", "icmp", "ht-tp", "dns", etc., which are encoded into 107-dimensional features using One-Hot encoding.
And preprocessing the feature data set obtained by encoding to obtain preprocessed feature data.
In this embodiment of the present invention, the step of preprocessing each feature data to obtain preprocessed feature data may include:
and calculating the standard deviation and the average value of the plurality of characteristic data, and respectively converting the plurality of characteristic data into corresponding preprocessed characteristic data by adopting the standard deviation and the average value.
In a specific implementation, the feature data set may be normalized by standard deviation, scaling the feature data to between [ -1, 1] intervals.
The specific scaling process is shown in the following formula:
Figure BDA0002557841940000081
wherein x' is the pre-processing characteristic data, x is the characteristic data,
Figure BDA0002557841940000082
the average value of all the characteristic data of the data stream is shown, and s is the number of the characteristic data.
Step 203, extracting the preprocessing characteristic data based on a preset long and short memory network, and generating a data packet characteristic vector corresponding to each data packet;
step 204, obtaining the time sequence information of the data packet in the data stream;
step 205, based on the timing sequence information, generating a data stream feature vector by using the data packet feature vectors corresponding to the plurality of data packets, respectively;
in the embodiment of the invention, the LSTM performs feature learning on each data packet, extracts feature data, outputs a feature vector of each data packet, determines the sequence position of each data packet in a data stream according to the time sequence information of each data packet, and combines all feature vectors into the data stream feature vector according to the sequence position.
Step 206, extracting the sequence relation among the plurality of data packets based on the long and short memory networks, and generating a network flow vector according to the data flow feature vector and the sequence relation;
in an embodiment of the present invention, step 206 may include: acquiring a sequence relation among the plurality of data packets, determining a last data packet feature vector based on the sequence relation, and generating a network flow vector according to the data flow feature vector and the last data packet feature vector.
In the embodiment of the invention, the LSTM is used for further learning the sequence relation among the data packets on the basis of the characteristic vector of the data stream, and finally the characteristic vector of the last data packet of the data stream, namely the characteristic vector of the last data packet, can be obtained, so that the characteristic vector of the data stream obtained in the process and the characteristic vector of the last data packet can be combined into a complete network vector stream.
Step 207, inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result.
In the embodiment of the present invention, after the complete network flow vector of the data flow is extracted, the network flow vector may be input into a preset classifier to determine the type of the data flow, so as to determine whether the network generating the data flow is a botnet.
In one example, a SoftMax classifier may be employed to perform the classification task to obtain the final output.
Referring to fig. 3, fig. 3 is an architecture diagram of a botnet detection method according to an embodiment of the present invention.
As shown in fig. 3, the LSTM algorithm performs feature learning on the data stream containing the pre-processed feature data, where LSTM1 performs packet feature extraction and LSTM2 performs network stream feature extraction; the network flow vector obtained through LSTM1 and LSTM2 learning is input into a SoftMax classifier, and the classification result of the data flow can be output.
The method comprises the steps of acquiring a plurality of data packets of a preset network; combining a plurality of data packets into a data stream; extracting a plurality of characteristic data of the data stream; preprocessing each characteristic data to obtain preprocessed characteristic data; extracting preprocessing characteristic data of each data packet to generate a data packet characteristic vector; generating a data flow characteristic vector by adopting the data packet characteristic vectors corresponding to the plurality of data packets respectively; acquiring a sequence relation among a plurality of data packets, and generating a network flow vector according to the sequence relation; and inputting the network flow vector into a preset classifier to obtain a classification result. To determine whether the network is a botnet based on the classification results. The botnet detection algorithm based on the long and short memory networks is suitable for a high-dimensional space modeling task based on flow characteristics, and can effectively improve the accuracy of botnet detection.
Referring to fig. 4, fig. 4 is a block diagram illustrating a botnet detection device according to an embodiment of the present invention.
The invention provides a botnet detection device, comprising:
a data stream combining module 401, configured to obtain a plurality of data packets of a preset network, and combine the plurality of data packets into a data stream;
a preprocessed feature data generating module 402, configured to extract a plurality of feature data of the data stream, and preprocess each feature data to obtain preprocessed feature data;
a data flow feature vector generation module 403, configured to extract the preprocessed feature data based on a preset long-short memory network, and generate a data flow feature vector;
a network flow vector generating module 404, configured to extract a sequence relationship between the multiple data packets based on the long and short memory networks, and generate a network flow vector according to the data flow feature vector and the sequence relationship;
the classification module 405 is configured to input the network flow vector into a preset classifier to obtain a classification result, and determine whether the preset network is a botnet according to the classification result.
In this embodiment of the present invention, the data stream feature vector generating module 403 includes:
a data packet feature vector generation submodule, configured to extract the preprocessing feature data of the data stream based on a preset long-short memory network, and generate a data packet feature vector corresponding to each data packet;
the time sequence information acquisition submodule is used for acquiring the time sequence information of the data packet in the data stream;
and the data stream feature vector generation submodule is used for generating the data stream feature vector by adopting the data packet feature vectors respectively corresponding to the plurality of data packets based on the time sequence information.
In this embodiment of the present invention, the network flow vector generating module 404 includes:
and the network flow vector generation submodule is used for extracting a sequence relation among the plurality of data packets based on the long and short memory network, determining a tail-bit data packet feature vector based on the sequence relation, and generating a network flow vector according to the data flow feature vector and the tail-bit data packet feature vector.
In this embodiment of the present invention, the preprocessing feature data generating module 402 includes:
and the preprocessing characteristic data generation submodule is used for calculating the standard deviation and the average value of the plurality of characteristic data and calculating the preprocessing characteristic data corresponding to each characteristic data in the plurality of characteristic data by adopting the standard deviation and the average value.
In this embodiment of the present invention, the data stream combination module 401 includes:
and the data stream combination sub-module is used for acquiring the IP information, the port information and the protocol number of each data packet and combining the data packets with the same IP information, the same port information and the same protocol number into a data stream.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A botnet detection method, comprising:
acquiring a plurality of data packets of a preset network, and combining the data packets into a data stream;
extracting a plurality of feature data of the data stream, and preprocessing each feature data to obtain preprocessed feature data;
extracting the preprocessing characteristic data based on a preset long and short memory network to generate a data flow characteristic vector;
extracting a sequence relation among the plurality of data packets based on the long and short memory network, and generating a network flow vector according to the data flow characteristic vector and the sequence relation;
and inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result.
2. The method according to claim 1, wherein the step of extracting the preprocessed feature data based on the predetermined long and short memory network to generate a data stream feature vector comprises:
extracting the preprocessing characteristic data based on a preset long and short memory network, and generating a data packet characteristic vector corresponding to each data packet;
acquiring time sequence information of a data packet in the data stream;
and generating a data stream characteristic vector by adopting the data packet characteristic vectors respectively corresponding to the plurality of data packets based on the time sequence information.
3. The method according to claim 2, wherein the step of extracting the sequence relationship among the plurality of data packets based on the long and short memory networks and generating the network flow vector according to the data flow feature vector and the sequence relationship comprises:
extracting a sequence relation among the plurality of data packets based on the long and short memory network, determining a last data packet feature vector based on the sequence relation, and generating a network flow vector according to the data flow feature vector and the last data packet feature vector.
4. The method of claim 1, wherein the step of preprocessing each feature data to obtain preprocessed feature data comprises:
and calculating the standard deviation and the average value of the plurality of characteristic data, and calculating to obtain the preprocessed characteristic data corresponding to each characteristic data in the plurality of characteristic data by adopting the standard deviation and the average value.
5. The method of claim 1, wherein the step of combining the plurality of packets into a data stream comprises:
and acquiring IP information, port information and a protocol number of each data packet, and combining the data packets with the same IP information, port information and protocol number into a data stream.
6. A botnet detection device, comprising:
the data flow combination module is used for acquiring a plurality of data packets of a preset network and combining the data packets into a data flow;
the preprocessing characteristic data generating module is used for extracting a plurality of characteristic data of the data stream and preprocessing each characteristic data to obtain preprocessing characteristic data;
the data flow characteristic vector generation module is used for extracting the preprocessing characteristic data based on a preset long and short memory network and generating a data flow characteristic vector;
the network flow vector generating module is used for extracting the sequence relation among the plurality of data packets based on the long and short memory networks and generating network flow vectors according to the data flow characteristic vectors and the sequence relation;
and the classification module is used for inputting the network flow vector into a preset classifier to obtain a classification result, and determining whether the preset network is a botnet or not according to the classification result.
7. The apparatus of claim 6, wherein the data stream feature vector generation module comprises:
the data packet characteristic vector generation submodule is used for extracting the preprocessing characteristic data based on a preset long and short memory network and generating a data packet characteristic vector corresponding to each data packet;
the time sequence information acquisition submodule is used for acquiring the time sequence information of the data packet in the data stream;
and the data stream feature vector generation submodule is used for generating the data stream feature vector by adopting the data packet feature vectors respectively corresponding to the plurality of data packets based on the time sequence information.
8. The apparatus of claim 7, wherein the network flow vector generation module comprises:
and the network flow vector generation submodule is used for extracting a sequence relation among the plurality of data packets based on the long and short memory network, determining a tail-bit data packet feature vector based on the sequence relation, and generating a network flow vector according to the data flow feature vector and the tail-bit data packet feature vector.
9. The apparatus of claim 6, wherein the pre-processing feature data generation module comprises:
and the preprocessing characteristic data generation submodule is used for calculating the standard deviation and the average value of the plurality of characteristic data and calculating the preprocessing characteristic data corresponding to each characteristic data in the plurality of characteristic data by adopting the standard deviation and the average value.
10. The apparatus of claim 6, wherein the data stream combining module comprises:
and the data stream combination sub-module is used for acquiring the IP information, the port information and the protocol number of each data packet and combining the data packets with the same IP information, the same port information and the same protocol number into a data stream.
CN202010597318.8A 2020-06-28 2020-06-28 Botnet detection method and device Pending CN111756757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010597318.8A CN111756757A (en) 2020-06-28 2020-06-28 Botnet detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010597318.8A CN111756757A (en) 2020-06-28 2020-06-28 Botnet detection method and device

Publications (1)

Publication Number Publication Date
CN111756757A true CN111756757A (en) 2020-10-09

Family

ID=72677571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010597318.8A Pending CN111756757A (en) 2020-06-28 2020-06-28 Botnet detection method and device

Country Status (1)

Country Link
CN (1) CN111756757A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108199863A (en) * 2017-11-27 2018-06-22 中国科学院声学研究所 A kind of net flow assorted method and system based on the study of two benches sequence signature
US20180288086A1 (en) * 2017-04-03 2018-10-04 Royal Bank Of Canada Systems and methods for cyberbot network detection
CN110995713A (en) * 2019-12-06 2020-04-10 北京理工大学 Botnet detection system and method based on convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180288086A1 (en) * 2017-04-03 2018-10-04 Royal Bank Of Canada Systems and methods for cyberbot network detection
CN108199863A (en) * 2017-11-27 2018-06-22 中国科学院声学研究所 A kind of net flow assorted method and system based on the study of two benches sequence signature
CN110995713A (en) * 2019-12-06 2020-04-10 北京理工大学 Botnet detection system and method based on convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
尹传龙等: "基于LSTM深度学习的僵尸网络检测模型", 《信息工程大学学报》 *
王伟: "基于深度学习的网络流量分类及异常检测方法研究", 《中国优秀博士学位论文全文数据库信息科技辑》 *

Similar Documents

Publication Publication Date Title
Shurman et al. DoS and DDoS attack detection using deep learning and IDS
CN111181901B (en) Abnormal flow detection device and abnormal flow detection method thereof
Lee et al. Detection of DDoS attacks using optimized traffic matrix
Chen et al. An effective conversation-based botnet detection method
Zolotukhin et al. Increasing web service availability by detecting application-layer DDoS attacks in encrypted traffic
Aiello et al. DNS tunneling detection through statistical fingerprints of protocol messages and machine learning
Karthick et al. Adaptive network intrusion detection system using a hybrid approach
CN109067586B (en) DDoS attack detection method and device
CN107222491B (en) Intrusion detection rule creating method based on industrial control network variant attack
Zargar et al. Category-based intrusion detection using PCA
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
Saravanan et al. A new framework to alleviate DDoS vulnerabilities in cloud computing.
Wanjau et al. SSH-brute force attack detection model based on deep learning
Buragohain et al. Anomaly based DDoS attack detection
CN111181930A (en) DDoS attack detection method, device, computer equipment and storage medium
KR101210622B1 (en) Method for detecting ip shared router and system thereof
CN113268735B (en) Distributed denial of service attack detection method, device, equipment and storage medium
Alyasiri et al. Grammatical evolution for detecting cyberattacks in Internet of Things environments
BR102020003105A2 (en) METHOD FOR DETECTION OF FAKE DNS SERVERS USING MACHINE LEARNING TECHNIQUES
Yang et al. Botnet detection based on machine learning
US20230199005A1 (en) Method and apparatus for detecting network attack based on fusion feature vector
CN111181969A (en) Spontaneous flow-based Internet of things equipment identification method
Nakahara et al. Machine Learning based Malware Traffic Detection on IoT Devices using Summarized Packet Data.
CN111756757A (en) Botnet detection method and device
Siboni et al. Botnet identification via universal anomaly detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201009

RJ01 Rejection of invention patent application after publication