CN109495513A - Unsupervised encryption malicious traffic stream detection method, device, equipment and medium - Google Patents
Unsupervised encryption malicious traffic stream detection method, device, equipment and medium Download PDFInfo
- Publication number
- CN109495513A CN109495513A CN201811635919.2A CN201811635919A CN109495513A CN 109495513 A CN109495513 A CN 109495513A CN 201811635919 A CN201811635919 A CN 201811635919A CN 109495513 A CN109495513 A CN 109495513A
- Authority
- CN
- China
- Prior art keywords
- node
- client
- cluster
- traffic stream
- service end
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Abstract
The embodiment of the present disclosure provides a kind of unsupervised encryption malicious traffic stream detection method, device, equipment and medium, and described method includes following steps: based on data characteristics collection needed for network flow acquisition;The bipartite graph between client and server-side is established using the data characteristics collection of acquisition;Client and service end node are clustered for the first time by figure cutting method;Vectorization processing is carried out with business end node to the client clothes in biggish connection subgraph in the first cluster;Data after vectorization are clustered again using DBScan algorithm;Malicious traffic stream and node are determined using the cluster result after the cluster again.The disclosure utilizes the unsupervised learning model based on figure, encryption flow detection can be directly carried out in the case where no priori knowledge and mark sample set, different types of race is separately won to obtain by carrying out two to figure, small race is converted by big nation, it is tested respectively by the feature of flow again and identifies malicious traffic stream, method is simple to operation.
Description
Technical field
This disclosure relates to data on flows detection technique field, specially a kind of unsupervised encryption malicious traffic stream detection side
Method, device, electronic equipment and storage medium.
Background technique
Network communication is the Information application that current nearly all enterprises and individuals can be related to.With enterprise and personal use
Family is higher and higher for the attention degree of information security, and the usage scenario of encryption technology is more and more in current network communication.I.e.
By encryption method Content of Communication can not be identified by the other users on network in addition to communicating pair.
At the same time, all kinds of rogue programs such as network wooden horse, worm etc. with control terminal when being communicated, in order to hide net
The identification of network detection device, often also using encryption traffic communication.This has resulted in normal encryption flow and malice encryption flow
Indistinguishable problem brings very big challenge for network security detection.
The method that detection currently for encryption malicious traffic stream mainly uses Supervised machine learning.Pass through malice encryption stream
The detection model of amount and normal encryption flow, the detection model can be used to differentiate whether encryption flow to be malicious traffic stream.
Main problem existing for existing scheme is:
(1) model training relies on a large amount of black sample, and sample size deficiency likely results in the detection model that training obtains
Inaccuracy;
(2) it relies on expertise analysis and extracts traffic characteristic, if expertise is unreliable, final classification results
There may be larger problem;
(3) due to based on Heuristics before, so poor for new attack pattern detection ability;
(4) it is easy to be bypassed by attacker according to characteristic set, once that is, attacker has found detection feature set used, then may be used
To evade these features by certain technological means.
Therefore, how to efficiently separate malicious traffic stream and have become a technical problem urgently to be resolved.
Summary of the invention
The disclosure be designed to provide a kind of unsupervised encryption malicious traffic stream detection method, device, electronic equipment and
Storage medium rapidly the malice in detection flows information can encrypt flow.
In a first aspect, the disclosure provides a kind of unsupervised encryption malicious traffic stream detection method, include the following steps:
Step S101: based on data characteristics collection needed for network flow acquisition;
Step S102: the bipartite graph between client and server-side is established using the data characteristics collection of acquisition;
Step S103: client and service end node are clustered for the first time by figure cutting method;
Step S104: client and service end node to larger connection subgraph in the first cluster carry out vectorization
Processing;
Step S105: the data after vectorization are clustered again using DBScan algorithm;
Step S106: malicious traffic stream and node are determined using the cluster result after the cluster again.
Optionally, the data characteristics collection includes:
Client encryption suite, the TLS extension of client support, server-side certificate.
Optionally, the data characteristics collection using acquisition establishes the bipartite graph between client and server-side, wraps
It includes:
Any client node is selected at random, is connected it and is corresponded to associated service end node, forms the client node
With the side of service end node;
All clients node and service end node are traversed, the data characteristics concentrates all client nodes and clothes
Business end node all forms corresponding connection relationship;
Bipartite graph is established using the connection relationship that the client node and service end node are formed.
It is optionally, described that client or service end node are clustered for the first time by figure cutting method, comprising:
The bipartite graph is subjected to subgraph cluster;
Each subgraph of absolutely not incidence relation is divided into different clusters, to be clustered for the first time.
Optionally, the client of larger connection subgraph in the first cluster and service end node are carried out at vectorization
Reason, comprising:
Select the connection subgraph that number of nodes is more in the first cluster;
Any one node from the subgraph randomly chooses a node as next section according to link relation
Point forms the sequence that a length is t;
For each of sequence node, using skip-gram method, learn itself using other nodes around it
Character representation, by the expression of each node from the OneHot of various dimensions coding dimensionality reduction at node diagnostic vector.
Optionally, the DBScan algorithm includes:
Calculate the distance between each node;
Based on node similitude described in the range estimation;
Node with similitude is gathered for one kind.
Optionally, the cluster result using after the cluster again determines malicious traffic stream and node, comprising:
Malicious traffic stream is carried out using the feature for servicing end node and/or client node in each cluster after clustering again to sentence
It is fixed;
If the feature of most service end node and/or client node is informal feature in a cluster, sentence
The fixed cluster is malice cluster;
Corresponding relationship existing for client node all in the malice cluster and service end node is restored, then to be wanted
The malicious traffic stream of detection.
Second aspect, the disclosure provide a kind of unsupervised encryption malicious traffic stream detection device, comprising:
Data acquisition unit, for based on data characteristics collection needed for network flow acquisition;
Construction unit, for establishing the bipartite graph between client and server-side using the data characteristics collection of acquisition;
First cluster cell, for being clustered for the first time by figure cutting method to client and service end node;
Vectorization unit, for the client and service end node progress to larger connection subgraph in the first cluster
Vectorization processing;
Cluster cell again clusters the data after vectorization using DBScan algorithm again;
Judging unit, for determining malicious traffic stream and node using the cluster result after the cluster again.
The third aspect, the disclosure provide a kind of electronic equipment, including processor and memory, the memory are stored with energy
Enough computer program instructions executed by the processor when the processor executes the computer program instructions, realize the
On the one hand any method and step.
Fourth aspect, the disclosure provide a kind of computer readable storage medium, are stored with computer program instructions, the meter
Calculation machine program instruction realizes any method and step of first aspect when being called and being executed by processor.
Compared with prior art, the beneficial effect of the embodiment of the present disclosure is:
The disclosure utilizes the unsupervised learning model based on figure, can be the case where no priori knowledge is with mark sample set
Encryption flow detection is directly carried out down, is separately won to obtain different types of race by carrying out two to figure, is converted small race for big nation, then divide
It is not tested by the feature of flow and identifies malicious traffic stream, method is simple to operation, can efficiently detect encryption
Malicious traffic stream.
Detailed description of the invention
In order to illustrate more clearly of the embodiment of the present disclosure or technical solution in the prior art, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this public affairs
The some embodiments opened for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the unsupervised encryption malicious traffic stream detection method flow diagram that the embodiment of the present disclosure provides;
Fig. 2 is the unsupervised encryption malicious traffic stream detection method bipartite graph schematic diagram that the embodiment of the present disclosure provides;
Fig. 3 is the structural schematic diagram for the unsupervised encryption malicious traffic stream detection device that the embodiment of the present disclosure provides;
Fig. 4 is the structural schematic diagram for the electronic equipment that the embodiment of the present disclosure provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present disclosure clearer, below in conjunction with the embodiment of the present disclosure
In attached drawing, the technical solution in the embodiment of the present disclosure is clearly and completely described, it is clear that described embodiment is
Disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in the disclosure, those of ordinary skill in the art
Every other embodiment obtained without creative efforts belongs to the range of disclosure protection.
The term used in the embodiments of the present disclosure is only to be not intended to be limiting merely for for the purpose of describing particular embodiments
The disclosure.In the embodiment of the present disclosure and the "an" of singular used in the attached claims, " described " and "the"
It is also intended to including most forms, unless the context clearly indicates other meaning, " a variety of " generally comprise at least two, but not
It excludes to include at least one situation.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate
There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three
Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though in the embodiments of the present disclosure may be using term first, second, third, etc. come description technique name
Claim, but these technical names should not necessarily be limited by these terms.These terms are only used to distinguish technical name.For example, not taking off
In the case where embodiment of the present disclosure range, the first signature verification can also be referred to as the second signature verification, similarly, the second school
Sign test name can also be referred to as the first signature verification.
Depending on context, word as used in this " if ", " if " can be construed to " ... when " or
" when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as
Fruit detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection (statement
Condition or event) when " or " in response to detection (condition or event of statement) ".
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
Include, so that commodity or system including a series of elements not only include those elements, but also including not clear
The other element listed, or further include for this commodity or the intrinsic element of system.In the feelings not limited more
Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or system for including the element also
There are other identical elements.
In addition, the step timing in following each method embodiments is only a kind of citing, rather than considered critical.
Referring to Fig. 1, in a first aspect, the disclosure provides a kind of unsupervised encryption malicious traffic stream detection method, including such as
Lower step:
Step S101: based on data characteristics collection needed for network flow acquisition;
Optionally, the data characteristics collection includes:
Client encryption suite, the TLS extension of client support, server-side certificate.
Step S102: the bipartite graph between client and server-side is established using the data characteristics collection of acquisition;
As shown in Fig. 2, by the client metadata feature such as figure left node extracted in flow, certificate metadata feature
Right side node is such as schemed, then any one stream can be expressed as the node of client features composition to certificate node
Side.And due to not having side between client metadata, side there will not be between digital certificate.Therefore node is equivalent to be divided into
Two classes form a bipartite graph.
Specific optional, the data characteristics collection using acquisition establishes two points between client and server-side
Figure, comprising:
Any client node is selected at random, is connected it and is corresponded to associated service end node, forms the client node
With the side of service end node;
All clients node and service end node are traversed, the data characteristics concentrates all client nodes and clothes
Business end node all forms corresponding connection relationship;
Bipartite graph is established using the connection relationship that the client node and service end node are formed.
Step S103: client and service end node are clustered for the first time by figure cutting method;
Wherein, figure cutting method refers to, for the bipartite graph that step S102 is formed, can have multiple subgraphs, i.e., this is a little
It is not no side between figure, does not have intersection between them, to form discrete race, can be divided into node according to this feature
Different groups completes first cluster.
Specifically, described cluster client or service end node by figure cutting method for the first time, comprising:
The bipartite graph is subjected to subgraph cluster;
Each subgraph of absolutely not incidence relation is divided into different clusters, to be clustered for the first time.
Step S104: to the client and service end node progress vector in larger connection subgraph in the first cluster
Change processing;
Optionally, the client of larger connection subgraph in the first cluster and service end node are carried out at vectorization
Reason, comprising:
Select the connection subgraph that number of nodes is more in the first cluster;
Any one node from the subgraph randomly chooses a node as next section according to link relation
Point forms the sequence that a length is t;
For each of sequence node, using skip-gram method, learn itself using other nodes around it
Character representation, by the expression of each node from the OneHot of various dimensions coding dimensionality reduction at node diagnostic vector.
Specifically, vectorization processing needs specific two steps:
The first step is to establish sequence node using random walk, and the specific method is as follows:
Any one node from figure randomly chooses a node as next node, shape according to link relation
At a sequence, defined nucleotide sequence length is t, then can form a client node and service the length that end node is alternately present
For the sequence of t;
Each of figure node will carry out above-mentioned steps as start node can then generate if there is Q node
Q length is t sequence.
Second step is the feature vector for obtaining each node using skip-gram method to the node in these sequences, tool
Body method is as follows:
Its input is a string of sequence nodes, and the OneHot coded representation of each node: being directed to a node N, if
His is ordered as n, then the initial vector corresponding to it is (0,0,0,0,0 ... ..., 1 ... ..., 0,0,0,0), i.e., in addition to n-th
Position is 1, remaining is all 0 vector.Its output is the feature vector P of the more low dimensional of node, the length is p, general feelings
P is far smaller than n under condition.
OneHot Feature Dimension Reduction is as follows at the process of P, for each nodes X, it is available it in different sequences
Different context Y (1-k), it is right for each (X, Y), use back-propagation algorithm training one neural network, nerve net
The input of network is the x of OneHot, and training label is the y of OneHot, and hidden layer is P, and training parameter is that (m is section for the matrix W of m*p
Point sum).Since the value of only one position in x is 1, during backpropagation, the parameter of certain a line in W only will be updated
Value, the parameter of this line are exactly that training obtains the low dimensional feature P of X.For each X, one can be obtained in aforementioned manners
A low dimensional P feature vector.
Step S105: the data after vectorization are clustered again using DBScan algorithm;
The basic thought of DBScan algorithm is the distance between calculate node, is based on range estimation node similitude, will have
There is the node of similitude to gather for one kind, to complete data clusters.
Optionally, the DBScan algorithm includes:
Calculate the distance between each node;
Based on node similitude described in the range estimation;
Node with similitude is gathered for one kind.
Step S106: malicious traffic stream and node are determined using the cluster result after the cluster again.
It can presence service end certificate node and client encryption suite minutiae in each cluster after cluster.Utilize card
The domain name in CN field, SAN field in book node, it can be determined that whether the server-side of the certificate is regular website.If one
Most certificate is informal website certificate in a cluster, it may be considered that the cluster is malice cluster.By client all in this cluster
It holds relationship corresponding with existing for service segment to restore, is then malicious traffic stream to be detected.
Optionally, the cluster result using after the cluster again determines malicious traffic stream and node, comprising:
Malicious traffic stream is carried out using the feature for servicing end node and/or client node in each cluster after clustering again to sentence
It is fixed;
If the feature of most service end node and/or client node is informal feature in a cluster, sentence
The fixed cluster is malice cluster;
Corresponding relationship existing for client node all in the malice cluster and service end node is restored, then to be wanted
The malicious traffic stream of detection.
The disclosure utilizes the unsupervised learning model based on figure, can be the case where no priori knowledge is with mark sample set
Encryption flow detection is directly carried out down, is separately won to obtain different types of race by carrying out two to figure, is converted small race for big nation, then divide
It is not tested by the feature of flow and identifies malicious traffic stream, method is simple to operation, can efficiently detect encryption
Malicious traffic stream.
Embodiment 2
As shown in figure 3, second aspect, the disclosure provides a kind of unsupervised encryption malicious traffic stream detection device, comprising: number
According to acquisition unit 301, construction unit 302, first cluster cell 303, vectorization unit 304 cluster cell 305 and sentences again
Order member 306, specifically,
Data acquisition unit 301, for based on data characteristics collection needed for network flow acquisition;
Construction unit 302 establishes two points between client and server-side for the data characteristics collection using acquisition
Figure;
First cluster cell 303, for being clustered for the first time by figure cutting method to client and service end node;
Vectorization unit 304, for the client and service end node to larger connection subgraph in the first cluster
Carry out vectorization processing;
Cluster cell 305 again cluster the data after vectorization using DBScan algorithm again;
Judging unit 306, for determining malicious traffic stream and node using the cluster result after the cluster again.
Embodiment 3
The disclosure provides a kind of computer readable storage medium, is stored with computer program instructions, the computer program
It instructs and realizes any method and step of first aspect when being called and being executed by processor.
The disclosure utilizes the unsupervised learning model based on figure, can be the case where no priori knowledge is with mark sample set
Encryption flow detection is directly carried out down, is separately won to obtain different types of race by carrying out two to figure, is converted small race for big nation, then divide
It is not tested by the feature of flow and identifies malicious traffic stream, method is simple to operation, can efficiently detect encryption
Malicious traffic stream.
Embodiment 4
As shown in figure 4, the disclosure provides a kind of electronic equipment, including processor and memory, the memory are stored with
The computer program instructions that can be executed by the processor when processor executes the computer program instructions, are realized
Any method and step of first aspect.
Below with reference to Fig. 4, it illustrates the structural representations for the electronic equipment 400 for being suitable for being used to realize the embodiment of the present disclosure
Figure.Terminal device in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting and connect
Receive device, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle
Carry navigation terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Electricity shown in Fig. 4
Sub- equipment is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 4, electronic equipment 400 may include processing unit (such as central processing unit, graphics processor etc.)
401, random access can be loaded into according to the program being stored in read-only memory (ROM) 402 or from storage device 408
Program in memory (RAM) 403 and execute various movements appropriate and processing.In RAM 403, it is also stored with electronic equipment
Various programs and data needed for 400 operations.Processing unit 401, ROM 402 and RAM 403 pass through the phase each other of bus 404
Even.Input/output (I/O) interface 405 is also connected to bus 404.
In general, following device can connect to I/O interface 405: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 406 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 407 of dynamic device etc.;Storage device 408 including such as tape, hard disk etc.;And communication device 409.Communication device
409, which can permit electronic equipment 400, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 4 shows tool
There is the electronic equipment 400 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 409, or from storage device 408
It is mounted, or is mounted from ROM 402.When the computer program is executed by processing unit 401, the embodiment of the present disclosure is executed
Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated,
In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to
Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit
Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned
Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity
When sub- equipment executes, so that the electronic equipment: obtaining at least two internet protocol addresses;Send to Node evaluation equipment includes institute
State the Node evaluation request of at least two internet protocol addresses, wherein the Node evaluation equipment is internet from described at least two
In protocol address, chooses internet protocol address and return;Receive the internet protocol address that the Node evaluation equipment returns;Its
In, the fringe node in acquired internet protocol address instruction content distributing network.
Alternatively, above-mentioned computer-readable medium carries one or more program, when said one or multiple programs
When being executed by the electronic equipment, so that the electronic equipment: receiving the Node evaluation including at least two internet protocol addresses and request;
From at least two internet protocol address, internet protocol address is chosen;Return to the internet protocol address selected;Wherein,
The fringe node in internet protocol address instruction content distributing network received.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof
Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+
+, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can
Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package,
Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part.
In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN)
Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service
Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard
The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, the
One acquiring unit is also described as " obtaining the unit of at least two internet protocol addresses ".
Claims (10)
1. a kind of unsupervised encryption malicious traffic stream detection method, which comprises the steps of:
Step S101: based on data characteristics collection needed for network flow acquisition;
Step S102: the bipartite graph between client and server-side is established using the data characteristics collection of acquisition;
Step S103: client and service end node are clustered for the first time by figure cutting method;
Step S104: client and service end node to larger connection subgraph in the first cluster carry out at vectorization
Reason;
Step S105: the data after vectorization are clustered again using DBScan algorithm;
Step S106: malicious traffic stream and node are determined using the cluster result after the cluster again.
2. the method according to claim 1, wherein the data characteristics collection includes:
Client encryption suite, the TLS extension of client support, server-side certificate.
3. the method according to claim 1, wherein the data characteristics collection using acquisition establishes client
Bipartite graph between end and server-side, comprising:
Any client node is selected at random, is connected it and is corresponded to associated service end node, forms the client node and clothes
The side of business end node;
All clients node and service end node are traversed, the data characteristics concentrates all client nodes and server-side
Node all forms corresponding connection relationship;
Bipartite graph is established using the connection relationship that the client node and service end node are formed.
4. according to the method described in claim 3, it is characterized in that, described to client or service end segment by figure cutting method
Point is clustered for the first time, comprising:
The bipartite graph is subjected to subgraph cluster;
Each subgraph of absolutely not incidence relation is divided into different clusters, to be clustered for the first time.
5. according to the method described in claim 4, it is characterized in that, described to larger connection subgraph in the first cluster
Client and service end node carry out vectorization processing, comprising:
Select the connection subgraph that number of nodes is more in the first cluster;
Any one node from the subgraph randomly chooses a node as next node, shape according to link relation
The sequence for being t at a length;
The spy of itself is learnt using other nodes around it using skip-gram method for each of sequence node
Sign indicates, by the expression of each node from the OneHot of various dimensions coding dimensionality reduction at node diagnostic vector.
6. according to the method described in claim 5, it is characterized in that, the DBScan algorithm includes:
Calculate the distance between each node;
Based on node similitude described in the range estimation;
Node with similitude is gathered for one kind.
7. according to the method described in claim 6, it is characterized in that, the cluster result using after the cluster again determines
Malicious traffic stream and node, comprising:
Malicious traffic stream judgement is carried out using the feature for servicing end node and/or client node in each cluster after clustering again;
If the feature of most service end node and/or client node is informal feature in a cluster, determining should
Cluster is malice cluster;
Corresponding relationship existing for client node all in the malice cluster and service end node is restored, then to be detected
Malicious traffic stream.
8. a kind of unsupervised encryption malicious traffic stream detection device characterized by comprising
Data acquisition unit, for based on data characteristics collection needed for network flow acquisition;
Construction unit, for establishing the bipartite graph between client and server-side using the data characteristics collection of acquisition;
First cluster cell, for being clustered for the first time by figure cutting method to client and service end node;
Vectorization unit, for the client and service end node progress vector to larger connection subgraph in the first cluster
Change processing;
Cluster cell again clusters the data after vectorization using DBScan algorithm again;
Judging unit, for determining malicious traffic stream and node using the cluster result after the cluster again.
9. a kind of electronic equipment, which is characterized in that including processor and memory, the memory is stored with can be by the place
The computer program instructions that device executes are managed, when the processor executes the computer program instructions, realize that claim 1-7 appoints
Method and step described in one.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer program instructions, the computer program
Instruction realizes method and step as claimed in claim 1 to 7 when being called and being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811635919.2A CN109495513B (en) | 2018-12-29 | 2018-12-29 | Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811635919.2A CN109495513B (en) | 2018-12-29 | 2018-12-29 | Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109495513A true CN109495513A (en) | 2019-03-19 |
CN109495513B CN109495513B (en) | 2021-06-01 |
Family
ID=65713294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811635919.2A Active CN109495513B (en) | 2018-12-29 | 2018-12-29 | Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109495513B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110138745A (en) * | 2019-04-23 | 2019-08-16 | 极客信安(北京)科技有限公司 | Abnormal host detection method, device, equipment and medium based on data stream sequences |
CN110958220A (en) * | 2019-10-24 | 2020-04-03 | 中国科学院信息工程研究所 | Network space security threat detection method and system based on heterogeneous graph embedding |
CN112134829A (en) * | 2019-06-25 | 2020-12-25 | 北京观成科技有限公司 | Method and device for generating encrypted flow characteristic set |
CN112217762A (en) * | 2019-07-09 | 2021-01-12 | 北京观成科技有限公司 | Malicious encrypted traffic identification method and device based on purpose |
CN113746780A (en) * | 2020-05-27 | 2021-12-03 | 极客信安(北京)科技有限公司 | Abnormal host detection method, device, medium and equipment based on host image |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750475A (en) * | 2012-06-07 | 2012-10-24 | 中国电子科技集团公司第三十研究所 | Detection method and system for cross comparison of malicious code of interior and exterior view based on virtual machine |
CN105871832A (en) * | 2016-03-29 | 2016-08-17 | 北京理工大学 | Network application encrypted traffic recognition method and device based on protocol attributes |
CN106101061A (en) * | 2016-05-24 | 2016-11-09 | 北京奇虎科技有限公司 | The automatic classification method of rogue program and device |
WO2017202466A1 (en) * | 2016-05-26 | 2017-11-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Network application function registration |
CN108776655A (en) * | 2018-06-01 | 2018-11-09 | 北京玄科技有限公司 | A kind of term vector training method and device having supervision |
CN109104441A (en) * | 2018-10-24 | 2018-12-28 | 上海交通大学 | A kind of detection system and method for the encryption malicious traffic stream based on deep learning |
-
2018
- 2018-12-29 CN CN201811635919.2A patent/CN109495513B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750475A (en) * | 2012-06-07 | 2012-10-24 | 中国电子科技集团公司第三十研究所 | Detection method and system for cross comparison of malicious code of interior and exterior view based on virtual machine |
CN105871832A (en) * | 2016-03-29 | 2016-08-17 | 北京理工大学 | Network application encrypted traffic recognition method and device based on protocol attributes |
CN106101061A (en) * | 2016-05-24 | 2016-11-09 | 北京奇虎科技有限公司 | The automatic classification method of rogue program and device |
WO2017202466A1 (en) * | 2016-05-26 | 2017-11-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Network application function registration |
CN108776655A (en) * | 2018-06-01 | 2018-11-09 | 北京玄科技有限公司 | A kind of term vector training method and device having supervision |
CN109104441A (en) * | 2018-10-24 | 2018-12-28 | 上海交通大学 | A kind of detection system and method for the encryption malicious traffic stream based on deep learning |
Non-Patent Citations (1)
Title |
---|
鲁刚: ""恶意流量特征提取综述"", 《信息网络安全》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110138745A (en) * | 2019-04-23 | 2019-08-16 | 极客信安(北京)科技有限公司 | Abnormal host detection method, device, equipment and medium based on data stream sequences |
CN110138745B (en) * | 2019-04-23 | 2021-08-24 | 极客信安(北京)科技有限公司 | Abnormal host detection method, device, equipment and medium based on data stream sequence |
CN112134829A (en) * | 2019-06-25 | 2020-12-25 | 北京观成科技有限公司 | Method and device for generating encrypted flow characteristic set |
CN112217762A (en) * | 2019-07-09 | 2021-01-12 | 北京观成科技有限公司 | Malicious encrypted traffic identification method and device based on purpose |
CN112217762B (en) * | 2019-07-09 | 2022-11-18 | 北京观成科技有限公司 | Malicious encrypted traffic identification method and device based on purpose |
CN110958220A (en) * | 2019-10-24 | 2020-04-03 | 中国科学院信息工程研究所 | Network space security threat detection method and system based on heterogeneous graph embedding |
CN110958220B (en) * | 2019-10-24 | 2020-12-29 | 中国科学院信息工程研究所 | Network space security threat detection method and system based on heterogeneous graph embedding |
CN113746780A (en) * | 2020-05-27 | 2021-12-03 | 极客信安(北京)科技有限公司 | Abnormal host detection method, device, medium and equipment based on host image |
Also Published As
Publication number | Publication date |
---|---|
CN109495513B (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7002638B2 (en) | Learning text data representation using random document embedding | |
CN109495513A (en) | Unsupervised encryption malicious traffic stream detection method, device, equipment and medium | |
Sun et al. | Near real-time twitter spam detection with machine learning techniques | |
US10547618B2 (en) | Method and apparatus for setting access privilege, server and storage medium | |
Kuehnhausen et al. | Trusting smartphone apps? To install or not to install, that is the question | |
US11188720B2 (en) | Computing system including virtual agent bot providing semantic topic model-based response | |
CN110138745B (en) | Abnormal host detection method, device, equipment and medium based on data stream sequence | |
CN109271418A (en) | Suspicious clique's recognition methods, device, equipment and computer readable storage medium | |
CN110378474A (en) | Fight sample generating method, device, electronic equipment and computer-readable medium | |
CN110909222B (en) | User portrait establishing method and device based on clustering, medium and electronic equipment | |
WO2019200810A1 (en) | User data authenticity analysis method and apparatus, storage medium and electronic device | |
CN108228887A (en) | For generating the method and apparatus of information | |
CN111090615A (en) | Method and device for analyzing and processing mixed assets, electronic equipment and storage medium | |
CN109495552A (en) | Method and apparatus for updating clicking rate prediction model | |
CN110413742A (en) | Duplicate checking method, apparatus, equipment and the storage medium of biographic information | |
CN112863683A (en) | Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium | |
CN111198967A (en) | User grouping method and device based on relational graph and electronic equipment | |
CN110443647A (en) | Information distribution method and equipment | |
US10706148B2 (en) | Spatial and temporal convolution networks for system calls based process monitoring | |
CN113962401A (en) | Federal learning system, and feature selection method and device in federal learning system | |
CN108470126A (en) | Data processing method, device and storage medium | |
CN112016792A (en) | User resource quota determining method and device and electronic equipment | |
CN114840634B (en) | Information storage method and device, electronic equipment and computer readable medium | |
TWI792923B (en) | Computer-implemented method, computer system and computer program product for enhancing user verification in mobile devices using model based on user interaction history | |
CN108960312A (en) | Method and apparatus for generating disaggregated model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211208 Address after: 610000 No. 1, floor 1, No. 109, hongdoushu street, Jinjiang District, Chengdu, Sichuan Patentee after: Geek Xin'an (Chengdu) Technology Co.,Ltd. Address before: 100080 room 61306, 3 / F, Beijing Friendship Hotel, 1 Zhongguancun South Street, Haidian District, Beijing Patentee before: JIKE XIN'AN (BEIJING) TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right |