CN110138745B - Abnormal host detection method, device, equipment and medium based on data stream sequence - Google Patents

Abnormal host detection method, device, equipment and medium based on data stream sequence Download PDF

Info

Publication number
CN110138745B
CN110138745B CN201910326907.XA CN201910326907A CN110138745B CN 110138745 B CN110138745 B CN 110138745B CN 201910326907 A CN201910326907 A CN 201910326907A CN 110138745 B CN110138745 B CN 110138745B
Authority
CN
China
Prior art keywords
host
feature vector
data
sequence
vector set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910326907.XA
Other languages
Chinese (zh)
Other versions
CN110138745A (en
Inventor
江斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geek Xin'an (Chengdu) Technology Co.,Ltd.
Original Assignee
Jike Xin'an Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jike Xin'an Beijing Technology Co ltd filed Critical Jike Xin'an Beijing Technology Co ltd
Priority to CN201910326907.XA priority Critical patent/CN110138745B/en
Publication of CN110138745A publication Critical patent/CN110138745A/en
Application granted granted Critical
Publication of CN110138745B publication Critical patent/CN110138745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The embodiment of the disclosure provides a method, a device, equipment and a medium for detecting an abnormal host based on a data flow sequence, wherein the method comprises the following steps: collecting various data flows of a plurality of time slices for each host, performing stream recombination, and establishing a data stream sequence; extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; forming a feature vector set for each of the hosts; taking the feature vector set of each host as a positive sample, taking the feature vector sets of the other N-1 hosts as a negative sample set, and training to form a detection model of each host; detecting through the detection model, and alarming as an abnormal feature vector set for a feature vector set with large difference; and (4) training and correcting the characteristic vectors which are not alarmed again to form a final detection model, and detecting the abnormal host by adopting the final detection model.

Description

Abnormal host detection method, device, equipment and medium based on data stream sequence
Technical Field
The disclosure relates to the technical field of flow data detection, in particular to a data flow sequence-based abnormal host detection method and device, electronic equipment and a storage medium.
Background
Network communication is an information application that is currently involved by almost all businesses and individuals. With the increasing importance of enterprises and personal users on information security, the use scenarios of encryption technologies in network communication are increasing. Namely, the communication content can not be identified by other users except the two communication parties on the network through an encryption method. This causes a problem that normal encrypted traffic cannot be distinguished from malicious encrypted traffic. And great challenges are brought to network security detection.
In the existing scheme, besides the effective decryption of encrypted traffic, the current detection of encrypted malicious traffic mainly adopts artificial intelligence methods such as machine learning and the like. The currently adopted method mainly detects whether a data stream is malicious traffic, and the main methods can be divided into two types:
(1) the supervised learning method comprises the following steps: carrying out detection model training by utilizing known malicious traffic, and then using the model for detection;
(2) unsupervised learning method: defining a group of clustering rules, carrying out clustering analysis on the flow, and screening out malicious flow clusters in the flow
Problems existing in the prior scheme
(1) The supervised model training relies on a large number of black samples, and the detection model obtained by training is likely to be inaccurate due to the insufficient number of the samples;
(2) unsupervised cluster analysis cannot exactly distinguish which traffic is malicious traffic, and can only carry out speculation through quantity proportion among different clusters and the like, so that the accuracy rate is low;
(3) for a single data stream as an analysis object, the incidence relation among the data streams is lost for some malicious behaviors, so that the detection accuracy is reduced;
(4) the attacker can easily bypass the detection according to the feature set, namely once the attacker finds the feature set used for detection, the features can be avoided through certain technical means.
Disclosure of Invention
The present disclosure provides an abnormal host detection method, an abnormal host detection device, an abnormal host detection electronic device, and a storage medium based on a data stream sequence, which can quickly detect malicious encrypted traffic in traffic information.
In a first aspect, the present disclosure provides a method for detecting an abnormal host based on a data stream sequence, including the following steps:
step S101: selecting N sampling hosts, collecting various data flows of a plurality of time slices for each host, performing stream recombination, and establishing a data stream sequence, wherein N is a natural number greater than 3;
step S102: extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; vectorizing the own characteristics and the associated characteristics to form a characteristic vector set of each host;
step S103, taking the feature vector set of each host as a positive sample, taking the feature vector sets of other N-1 hosts as a negative sample set, and training to form a detection model of each host;
s104, detecting through the detection model, and alarming as an abnormal feature vector set for a feature vector set with large difference;
step S105: and adding the characteristic vectors which are not alarmed into the characteristic vector set of the corresponding host, performing retraining and correction on the detection model by using the new characteristic vector set within a certain time to form a final detection model, and detecting the abnormal host by using the final detection model.
Optionally, the self-characteristics include:
a port of a flow destination, a flow destination IP, a flow destination domain name, a DNS query domain name, a flow length, or a digital certificate.
Optionally, the associated features include:
access port sequence, access IP sequence, access domain name sequence, IP access breadth, access IP breadth, access domain name set, data uplink and downlink quantity, uplink and downlink flow quantity, digital certificate set, open service port set or connection initiation frequency.
Optionally, the detection model is a random forest model, and the specific training method includes:
step S1031: defining the feature vector set of a detection host A as X, the feature vector sets of other hosts as Y, and randomly sampling and combining the feature vector set X and the feature vector set Y after selecting the number m of decision trees to construct a data set T:
T={(x1,y1),(x2,y2),(x3,y3)…(xm,ym)};
step S1032: establishing a corresponding decision tree based on the data set T:
Figure BDA0002036518420000031
wherein, h (t) represents the voting result of any decision tree, and Pi represents the probability that any data vector passes (xi, yi) detection;
step S1033: constructing a final result of the random forest model by combining votes from a plurality of the decision trees:
Figure BDA0002036518420000032
where m is the number of decision trees, wi is the weight of the trees, zi is the voting result of the trees, and RF is the voting result of the random forest.
In a second aspect, the present disclosure provides an abnormal host detection apparatus based on a data flow sequence, including:
a sequence construction unit: the system comprises a data flow sequence establishing module, a data flow control module and a data flow control module, wherein the data flow control module is used for selecting N sampling hosts, collecting various data flows of a plurality of time slices for each host, performing flow recombination and establishing a data flow sequence, and N is a natural number more than 3;
a vector extraction unit: extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; vectorizing the own characteristics and the associated characteristics to form a characteristic vector set of each host;
the model construction unit is used for training the feature vector set of each host as a positive sample and the feature vector sets of the other N-1 hosts as a negative sample set to form a detection model of each host;
the model detection unit is used for detecting through the detection model and alarming the feature vector set with larger difference as an abnormal feature vector set;
a result output unit: and adding the characteristic vectors which are not alarmed into the characteristic vector set of the corresponding host, performing retraining and correction on the detection model by using the new characteristic vector set within a certain time to form a final detection model, and detecting the abnormal host by using the final detection model.
Optionally, the self-characteristics include:
a port of a flow destination, a flow destination IP, a flow destination domain name, a DNS query domain name, a flow length, or a digital certificate.
Optionally, the associated features include:
access port sequence, access IP sequence, access domain name sequence, IP access breadth, access IP breadth, access domain name set, data uplink and downlink quantity, uplink and downlink flow quantity, digital certificate set, open service port set or connection initiation frequency.
Optionally, the detection model is a random forest model, and specifically includes:
a definition unit: defining the feature vector set of a detection host A as X, the feature vector sets of other hosts as Y, and randomly sampling and combining the feature vector set X and the feature vector set Y after selecting the number m of decision trees to construct a data set T:
T={(x1,y1),(x2,y2),(x3,y3)…(xm,ym)};
the establishing unit: establishing a corresponding decision tree based on the data set T:
Figure BDA0002036518420000041
wherein, h (t) represents the voting result of any decision tree, and Pi represents the probability that any data vector passes (xi, yi) detection;
a result unit: constructing a final result of the random forest model by combining votes from a plurality of the decision trees:
Figure BDA0002036518420000042
where m is the number of decision trees, wi is the weight of the trees, zi is the voting result of the trees, and RF is the voting result of the random forest.
In a third aspect, the present disclosure provides an electronic device comprising a processor and a memory, wherein the memory stores computer program instructions executable by the processor, and the processor implements the method steps of any one of the first aspect when executing the computer program instructions.
In a fourth aspect, the present disclosure provides a computer readable storage medium storing computer program instructions which, when invoked and executed by a processor, implement the method steps of any of the first aspects.
Compared with the prior art, the beneficial effects of the embodiment of the disclosure are that:
the method and the device can more comprehensively utilize the host network behavior information to discover abnormal behaviors therein by utilizing the detection model based on the behavior sequence, and can utilize normal samples discovered in the later stage to correct the detection model, thereby reducing the attenuation speed of the detection model, adapting to different detection scenes and improving the stability of the model.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained according to the drawings without creative efforts for those skilled in the art.
Fig. 1 is a schematic flow chart of an abnormal host detection method based on a data flow sequence according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an abnormal host detection apparatus based on a data flow sequence according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a training model provided in the embodiments of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in the presently disclosed embodiments and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two, but does not exclude the presence of at least one.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that although the terms first, second, third, etc. may be used to describe technical names in embodiments of the present disclosure, the technical names should not be limited to the terms. These terms are only used to distinguish between technical names. For example, a first check signature may also be referred to as a second check signature, and similarly, a second check signature may also be referred to as a first check signature, without departing from the scope of embodiments of the present disclosure.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
The scheme proposes a machine learning method based on behavior sequences (data flow sequences) to detect the attacked host (including a client or a server). Firstly, normal state portrayal is carried out on each host on the basis of the general network behavior sequence state of the detected host in the current network, and then the behavior sequence characteristics at the later stage are continuously compared with the difference of the portrayal to judge whether the host is attacked or lost.
It has the advantages that:
(1) based on the context multi-stream behavior feature set, the attack is difficult to bypass the detection strategy;
(2) the method does not depend on expert knowledge to extract features, but automatically establishes a host feature model based on the current host state;
(3) the real-time learning and updating can be carried out along with the characteristics of the detected network, and the attenuation period of the accuracy rate is longer.
Example 1
Referring to fig. 1, in a first aspect, the present disclosure provides a method for detecting an abnormal host based on a data stream sequence, including the following steps:
step S101: selecting N sampling hosts, collecting various data flows of a plurality of time slices for each host, performing flow recombination, and establishing a data flow sequence, wherein N is a natural number greater than 3.
To collect traffic for a number of typical time segments, including but not limited to day, night, work day, holiday, rush hour, etc., each time segment may be defined according to the host's general behavior cycle of the detected network, i.e. the time a series of associated behaviors can be generally completed, such as 10 minutes. Carrying out stream recombination on various collected data to form a data stream; and for the UDP data without connection, merging the data according to the response relation.
Step S102: extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; and vectorizing the own characteristics and the associated characteristics to form a characteristic vector set of each host.
The feature sequence is to extract features of all data streams of a host in a time slice, and needs to include two main features, namely, the attribute features of the data streams; the second is the data flow association feature, which is shown in the following table.
Figure BDA0002036518420000071
Figure BDA0002036518420000081
Figure BDA0002036518420000091
Step S103, taking the feature vector set of each host as a positive sample, taking the feature vector sets of other N-1 hosts as a negative sample set, and training to form a detection model of each host;
s104, detecting through the detection model, and alarming as an abnormal feature vector set for a feature vector set with large difference;
the feature vector set is obtained by merging feature vectors acquired from a plurality of time segments and performing normalization processing to facilitate use of feature data in later training.
Step S105: and adding the characteristic vectors which are not alarmed into the characteristic vector set of the corresponding host, performing retraining and correction on the detection model by using the new characteristic vector set within a certain time to form a final detection model, and detecting the abnormal host by using the final detection model.
Optionally, the detection model is a random forest model, and the specific training method includes:
step S1031: defining the feature vector set of a detection host A as X, the feature vector sets of other hosts as Y, and randomly sampling and combining the feature vector set X and the feature vector set Y after selecting the number m of decision trees to construct a data set T:
T={(x1,y1),(x2,y2),(x3,y3)…(xm,ym)};
step S1032: establishing a corresponding decision tree based on the data set T:
Figure BDA0002036518420000092
wherein, h (t) represents the detection result of any decision tree, and Pi represents the probability that any data vector passes (xi, yi) detection;
step S1033: constructing a final result of the random forest model by combining votes from a plurality of the decision trees:
Figure BDA0002036518420000093
where m is the number of decision trees, wi is the weight of the trees, zi is the voting result of the trees, and RF is the voting result of the random forest.
By the method, the model of each host is trained, namely N hosts are assumed, the feature sets of the N-1 hosts are used as negative samples, and the feature set of the host is used as a positive sample for training. In the training process, hosts with larger attribute difference can be put together for comparison training based on prior knowledge (such as host attribute planning, network planning and the like), so that the training speed can be increased.
The method and the device can more comprehensively utilize the host network behavior information to discover abnormal behaviors therein by utilizing the detection model based on the behavior sequence, and can utilize normal samples discovered in the later stage to correct the detection model, thereby reducing the attenuation speed of the detection model, adapting to different detection scenes and improving the stability of the model.
Example 2
As shown in fig. 2, in a second aspect, the present disclosure provides an abnormal host detection apparatus based on a data flow sequence, including:
sequence construction unit 202: the system comprises a data flow sequence establishing module, a data flow control module and a data flow control module, wherein the data flow control module is used for selecting N sampling hosts, collecting various data flows of a plurality of time slices for each host, performing flow recombination and establishing a data flow sequence, and N is a natural number more than 3;
the vector extraction unit 204: extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; vectorizing the own characteristics and the associated characteristics to form a characteristic vector set of each host;
a model construction unit 206, which takes the feature vector set of each host as a positive sample and takes the feature vector sets of other N-1 hosts as a negative sample set to train and form a detection model of each host;
the model detection unit 208 detects the feature vector set with large difference through the detection model, and alarms the feature vector set as an abnormal feature vector set;
the result output unit 210: and adding the characteristic vectors which are not alarmed into the characteristic vector set of the corresponding host, performing retraining and correction on the detection model by using the new characteristic vector set within a certain time to form a final detection model, and detecting the abnormal host by using the final detection model.
Optionally, the self-characteristics include:
a port of a flow destination, a flow destination IP, a flow destination domain name, a DNS query domain name, a flow length, or a digital certificate.
Optionally, the associated features include:
access port sequence, access IP sequence, access domain name sequence, IP access breadth, access IP breadth, access domain name set, data uplink and downlink quantity, uplink and downlink flow quantity, digital certificate set, open service port set or connection initiation frequency.
As shown in fig. 3, the detection model is a random forest model, and specifically includes:
the definition unit 302: defining the feature vector set of a detection host A as X, the feature vector sets of other hosts as Y, and randomly sampling and combining the feature vector set X and the feature vector set Y after selecting the number m of decision trees to construct a data set T:
T={(x1,y1),(x2,y2),(x3,y3)…(xm,ym)};
the establishing unit 304: establishing a corresponding decision tree based on the data set T:
Figure BDA0002036518420000111
wherein, h (t) represents the detection result of any decision tree, and Pi represents the probability that any data vector passes (xi, yi) detection;
the result unit 306: constructing a final result of the random forest model by combining votes from a plurality of the decision trees:
Figure BDA0002036518420000112
where m is the number of decision trees, wi is the weight of the trees, zi is the voting result of the trees, and RF is the voting result of the random forest.
The method and the device can more comprehensively utilize the host network behavior information to discover abnormal behaviors therein by utilizing the detection model based on the behavior sequence, and can utilize normal samples discovered in the later stage to correct the detection model, thereby reducing the attenuation speed of the detection model, adapting to different detection scenes and improving the stability of the model.
Example 3
The present disclosure provides a computer readable storage medium storing computer program instructions which, when invoked and executed by a processor, implement the method steps of any of the first aspects.
The method has the advantages that the unsupervised learning model based on the graph is utilized, the encrypted flow detection can be directly carried out under the condition that no priori knowledge and no labeled sample set exist, different types of families are obtained by dividing the graph into two parts, the large family is converted into the small family, and then the malicious flow is detected and identified through the flow characteristics.
Example 4
As shown in fig. 4, the present disclosure provides an electronic device comprising a processor and a memory, wherein the memory stores computer program instructions executable by the processor, and the processor implements the method steps of any one of the first aspect when executing the computer program instructions.
Referring now to FIG. 4, a block diagram of an electronic device 400 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

Claims (8)

1. An abnormal host detection method based on a data flow sequence is characterized by comprising the following steps:
step S101: selecting N sampling hosts, collecting various data flows of a plurality of time slices for each host, performing stream recombination, and establishing a data stream sequence, wherein N is a natural number greater than 3;
step S102: extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; vectorizing the own characteristics and the associated characteristics to form a characteristic vector set of each host; the associated features include: an access port sequence, an access IP sequence, an access domain name sequence, an IP access breadth, an accessed IP breadth, an access domain name set, data uplink and downlink quantity, uplink and downlink flow quantity, a digital certificate set, an open service port set or connection initiation frequency;
step S103, taking the feature vector set of each host as a positive sample, taking the feature vector sets of other N-1 hosts as a negative sample set, and training to form a detection model of each host;
s104, detecting through the detection model, and alarming as an abnormal feature vector set for a feature vector set with large difference;
step S105: and adding the characteristic vectors which are not alarmed into the characteristic vector set of the corresponding host, performing retraining and correction on the detection model by using the new characteristic vector set within a certain time to form a final detection model, and detecting the abnormal host by using the final detection model.
2. The method of claim 1, wherein the self-characterization comprises:
a port of a flow destination, a flow destination IP, a flow destination domain name, a DNS query domain name, a flow length, or a digital certificate.
3. The method as claimed in claim 2, wherein the detection model is a random forest model, and the specific training method comprises:
step S1031: defining the feature vector set of a detection host A as X, the feature vector sets of other hosts as Y, and randomly sampling and combining the feature vector set X and the feature vector set Y after selecting the number m of decision trees to construct a data set T:
T={(x1,y1),(x2,y2),(x3,y3)…(xm,ym)};
step S1032: establishing a corresponding decision tree based on the data set T:
Figure FDA0003136201110000021
wherein, h (t) represents the voting result of any decision tree, and Pi represents the probability that any data vector passes (xi, yi) detection;
step S1033: constructing a final result of the random forest model by combining votes from a plurality of the decision trees:
Figure FDA0003136201110000022
where m is the number of decision trees, wi is the weight of the trees, zi is the voting result of the trees, and RF is the voting result of the random forest.
4. An abnormal host detection device based on a data flow sequence, comprising:
a sequence construction unit: the system comprises a data flow sequence establishing module, a data flow control module and a data flow control module, wherein the data flow control module is used for selecting N sampling hosts, collecting various data flows of a plurality of time slices for each host, performing flow recombination and establishing a data flow sequence, and N is a natural number more than 3;
a vector extraction unit: extracting the self-characteristics of each data stream sequence and the associated characteristics among the data stream sequences; vectorizing the own characteristics and the associated characteristics to form a characteristic vector set of each host; the associated features include: an access port sequence, an access IP sequence, an access domain name sequence, an IP access breadth, an accessed IP breadth, an access domain name set, data uplink and downlink quantity, uplink and downlink flow quantity, a digital certificate set, an open service port set or connection initiation frequency;
the model construction unit is used for training the feature vector set of each host as a positive sample and the feature vector sets of the other N-1 hosts as a negative sample set to form a detection model of each host;
the model detection unit is used for detecting through the detection model and alarming the feature vector set with larger difference as an abnormal feature vector set;
a result output unit: and adding the characteristic vectors which are not alarmed into the characteristic vector set of the corresponding host, performing retraining and correction on the detection model by using the new characteristic vector set within a certain time to form a final detection model, and detecting the abnormal host by using the final detection model.
5. The apparatus of claim 4, wherein the self-contained features comprise:
a port of a flow destination, a flow destination IP, a flow destination domain name, a DNS query domain name, a flow length, or a digital certificate.
6. The apparatus according to claim 5, wherein the detection model is a random forest model, and specifically comprises:
a definition unit: defining the feature vector set of a detection host A as X, the feature vector sets of other hosts as Y, and randomly sampling and combining the feature vector set X and the feature vector set Y after selecting the number m of decision trees to construct a data set T:
T={(x1,y1),(x2,y2),(x3,y3)…(xm,ym)};
the establishing unit: establishing a corresponding decision tree based on the data set T:
Figure FDA0003136201110000031
where H (T) represents the voting result of any decision tree, PiRepresents the probability that any data vector is detected by (xi, yi);
a result unit: constructing a final result of the random forest model by combining votes from a plurality of the decision trees:
Figure FDA0003136201110000032
where m is the number of decision trees, wi is the weight of the trees, zi is the voting result of the trees, and RF is the voting result of the random forest.
7. An electronic device comprising a processor and a memory, the memory storing computer program instructions executable by the processor, the processor implementing the method steps of any of claims 1-3 when executing the computer program instructions.
8. A computer-readable storage medium, characterized in that computer program instructions are stored which, when called and executed by a processor, implement the method steps of any of claims 1-3.
CN201910326907.XA 2019-04-23 2019-04-23 Abnormal host detection method, device, equipment and medium based on data stream sequence Active CN110138745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910326907.XA CN110138745B (en) 2019-04-23 2019-04-23 Abnormal host detection method, device, equipment and medium based on data stream sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910326907.XA CN110138745B (en) 2019-04-23 2019-04-23 Abnormal host detection method, device, equipment and medium based on data stream sequence

Publications (2)

Publication Number Publication Date
CN110138745A CN110138745A (en) 2019-08-16
CN110138745B true CN110138745B (en) 2021-08-24

Family

ID=67570749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910326907.XA Active CN110138745B (en) 2019-04-23 2019-04-23 Abnormal host detection method, device, equipment and medium based on data stream sequence

Country Status (1)

Country Link
CN (1) CN110138745B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110830328B (en) * 2019-11-27 2021-08-03 厦门网宿有限公司 Method and device for detecting abnormity of network link
CN113746780B (en) * 2020-05-27 2023-06-20 极客信安(北京)科技有限公司 Abnormal host detection method, device, medium and equipment based on host image
CN113839912B (en) * 2020-06-24 2023-08-22 极客信安(北京)科技有限公司 Method, device, medium and equipment for analyzing abnormal host by active and passive combination
CN114205095B (en) * 2020-08-27 2023-08-18 极客信安(北京)科技有限公司 Method and device for detecting encrypted malicious traffic
CN112671551B (en) * 2020-11-23 2022-11-18 中国船舶重工集团公司第七0九研究所 Network traffic prediction method and system based on event correlation
CN112822167B (en) * 2020-12-31 2023-04-07 杭州中电安科现代科技有限公司 Abnormal TLS encrypted traffic detection method and system
CN112905671A (en) * 2021-03-24 2021-06-04 北京必示科技有限公司 Time series exception handling method and device, electronic equipment and storage medium
CN113271292B (en) * 2021-04-07 2022-05-10 中国科学院信息工程研究所 Malicious domain name cluster detection method and device based on word vectors

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103856370A (en) * 2014-02-25 2014-06-11 中国科学院计算技术研究所 Application flow recognition method and system
CN107153789A (en) * 2017-04-24 2017-09-12 西安电子科技大学 The method for detecting Android Malware in real time using random forest grader
CN107992746A (en) * 2017-12-14 2018-05-04 华中师范大学 Malicious act method for digging and device
CN109379377A (en) * 2018-11-30 2019-02-22 极客信安(北京)科技有限公司 Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium
CN109495513A (en) * 2018-12-29 2019-03-19 极客信安(北京)科技有限公司 Unsupervised encryption malicious traffic stream detection method, device, equipment and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10050892B2 (en) * 2014-01-14 2018-08-14 Marvell International Ltd. Method and apparatus for packet classification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103856370A (en) * 2014-02-25 2014-06-11 中国科学院计算技术研究所 Application flow recognition method and system
CN107153789A (en) * 2017-04-24 2017-09-12 西安电子科技大学 The method for detecting Android Malware in real time using random forest grader
CN107992746A (en) * 2017-12-14 2018-05-04 华中师范大学 Malicious act method for digging and device
CN109379377A (en) * 2018-11-30 2019-02-22 极客信安(北京)科技有限公司 Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium
CN109495513A (en) * 2018-12-29 2019-03-19 极客信安(北京)科技有限公司 Unsupervised encryption malicious traffic stream detection method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于行为相似性的网络用户识别系统设计与实现;曾思源;《中国优秀硕士学位论文全文数据库》;20181015;全文 *

Also Published As

Publication number Publication date
CN110138745A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN110138745B (en) Abnormal host detection method, device, equipment and medium based on data stream sequence
CN109379377B (en) Encrypted malicious traffic detection method and device, electronic equipment and storage medium
CN109495513B (en) Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium
CN106982230B (en) Flow detection method and system
US20180248879A1 (en) Method and apparatus for setting access privilege, server and storage medium
JP7120350B2 (en) SECURITY INFORMATION ANALYSIS METHOD, SECURITY INFORMATION ANALYSIS SYSTEM AND PROGRAM
US9491186B2 (en) Method and apparatus for providing hierarchical pattern recognition of communication network data
JP2016091549A (en) Systems, devices, and methods for separating malware and background events
CN110046297A (en) Recognition methods, device and the storage medium of O&M violation operation
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN115514558A (en) Intrusion detection method, device, equipment and medium
US11158315B2 (en) Secure speech recognition
US11165779B2 (en) Generating a custom blacklist for a listening device based on usage
CN114840634B (en) Information storage method and device, electronic equipment and computer readable medium
CN116595523A (en) Multi-engine file detection method, system, equipment and medium based on dynamic arrangement
CN115906064A (en) Detection method, detection device, electronic equipment and computer readable medium
CN113794731B (en) Method, device, equipment and medium for identifying CDN (content delivery network) -based traffic masquerading attack
CN113452810B (en) Traffic classification method, device, equipment and medium
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
CN114726823A (en) Domain name generation method, device and equipment based on generation countermeasure network
CN111432080A (en) Ticket data processing method, electronic equipment and computer readable storage medium
CN114205095B (en) Method and device for detecting encrypted malicious traffic
CN113572768B (en) Analysis method for abnormal change of number of botnet family propagation sources
US20230289227A1 (en) Multi-Computer System for Forecasting Data Surges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211208

Address after: 610000 No. 1, floor 1, No. 109, hongdoushu street, Jinjiang District, Chengdu, Sichuan

Patentee after: Geek Xin'an (Chengdu) Technology Co.,Ltd.

Address before: 100080 room 61306, 3 / F, Beijing Friendship Hotel, 1 Zhongguancun South Street, Haidian District, Beijing

Patentee before: JIKE XIN'AN (BEIJING) TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right