CN109495513A - Unsupervised encryption malicious traffic stream detection method, device, equipment and medium - Google Patents

Unsupervised encryption malicious traffic stream detection method, device, equipment and medium Download PDF

Info

Publication number
CN109495513A
CN109495513A CN201811635919.2A CN201811635919A CN109495513A CN 109495513 A CN109495513 A CN 109495513A CN 201811635919 A CN201811635919 A CN 201811635919A CN 109495513 A CN109495513 A CN 109495513A
Authority
CN
China
Prior art keywords
node
client
cluster
traffic stream
service end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811635919.2A
Other languages
Chinese (zh)
Other versions
CN109495513B (en
Inventor
江斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geek Xin'an (Chengdu) Technology Co.,Ltd.
Original Assignee
Geek Xin'an (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geek Xin'an (beijing) Technology Co Ltd filed Critical Geek Xin'an (beijing) Technology Co Ltd
Priority to CN201811635919.2A priority Critical patent/CN109495513B/en
Publication of CN109495513A publication Critical patent/CN109495513A/en
Application granted granted Critical
Publication of CN109495513B publication Critical patent/CN109495513B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The embodiment of the present disclosure provides a kind of unsupervised encryption malicious traffic stream detection method, device, equipment and medium, and described method includes following steps: based on data characteristics collection needed for network flow acquisition;The bipartite graph between client and server-side is established using the data characteristics collection of acquisition;Client and service end node are clustered for the first time by figure cutting method;Vectorization processing is carried out with business end node to the client clothes in biggish connection subgraph in the first cluster;Data after vectorization are clustered again using DBScan algorithm;Malicious traffic stream and node are determined using the cluster result after the cluster again.The disclosure utilizes the unsupervised learning model based on figure, encryption flow detection can be directly carried out in the case where no priori knowledge and mark sample set, different types of race is separately won to obtain by carrying out two to figure, small race is converted by big nation, it is tested respectively by the feature of flow again and identifies malicious traffic stream, method is simple to operation.

Description

Unsupervised encryption malicious traffic stream detection method, device, equipment and medium
Technical field
This disclosure relates to data on flows detection technique field, specially a kind of unsupervised encryption malicious traffic stream detection side Method, device, electronic equipment and storage medium.
Background technique
Network communication is the Information application that current nearly all enterprises and individuals can be related to.With enterprise and personal use Family is higher and higher for the attention degree of information security, and the usage scenario of encryption technology is more and more in current network communication.I.e. By encryption method Content of Communication can not be identified by the other users on network in addition to communicating pair.
At the same time, all kinds of rogue programs such as network wooden horse, worm etc. with control terminal when being communicated, in order to hide net The identification of network detection device, often also using encryption traffic communication.This has resulted in normal encryption flow and malice encryption flow Indistinguishable problem brings very big challenge for network security detection.
The method that detection currently for encryption malicious traffic stream mainly uses Supervised machine learning.Pass through malice encryption stream The detection model of amount and normal encryption flow, the detection model can be used to differentiate whether encryption flow to be malicious traffic stream.
Main problem existing for existing scheme is:
(1) model training relies on a large amount of black sample, and sample size deficiency likely results in the detection model that training obtains Inaccuracy;
(2) it relies on expertise analysis and extracts traffic characteristic, if expertise is unreliable, final classification results There may be larger problem;
(3) due to based on Heuristics before, so poor for new attack pattern detection ability;
(4) it is easy to be bypassed by attacker according to characteristic set, once that is, attacker has found detection feature set used, then may be used To evade these features by certain technological means.
Therefore, how to efficiently separate malicious traffic stream and have become a technical problem urgently to be resolved.
Summary of the invention
The disclosure be designed to provide a kind of unsupervised encryption malicious traffic stream detection method, device, electronic equipment and Storage medium rapidly the malice in detection flows information can encrypt flow.
In a first aspect, the disclosure provides a kind of unsupervised encryption malicious traffic stream detection method, include the following steps:
Step S101: based on data characteristics collection needed for network flow acquisition;
Step S102: the bipartite graph between client and server-side is established using the data characteristics collection of acquisition;
Step S103: client and service end node are clustered for the first time by figure cutting method;
Step S104: client and service end node to larger connection subgraph in the first cluster carry out vectorization Processing;
Step S105: the data after vectorization are clustered again using DBScan algorithm;
Step S106: malicious traffic stream and node are determined using the cluster result after the cluster again.
Optionally, the data characteristics collection includes:
Client encryption suite, the TLS extension of client support, server-side certificate.
Optionally, the data characteristics collection using acquisition establishes the bipartite graph between client and server-side, wraps It includes:
Any client node is selected at random, is connected it and is corresponded to associated service end node, forms the client node With the side of service end node;
All clients node and service end node are traversed, the data characteristics concentrates all client nodes and clothes Business end node all forms corresponding connection relationship;
Bipartite graph is established using the connection relationship that the client node and service end node are formed.
It is optionally, described that client or service end node are clustered for the first time by figure cutting method, comprising:
The bipartite graph is subjected to subgraph cluster;
Each subgraph of absolutely not incidence relation is divided into different clusters, to be clustered for the first time.
Optionally, the client of larger connection subgraph in the first cluster and service end node are carried out at vectorization Reason, comprising:
Select the connection subgraph that number of nodes is more in the first cluster;
Any one node from the subgraph randomly chooses a node as next section according to link relation Point forms the sequence that a length is t;
For each of sequence node, using skip-gram method, learn itself using other nodes around it Character representation, by the expression of each node from the OneHot of various dimensions coding dimensionality reduction at node diagnostic vector.
Optionally, the DBScan algorithm includes:
Calculate the distance between each node;
Based on node similitude described in the range estimation;
Node with similitude is gathered for one kind.
Optionally, the cluster result using after the cluster again determines malicious traffic stream and node, comprising:
Malicious traffic stream is carried out using the feature for servicing end node and/or client node in each cluster after clustering again to sentence It is fixed;
If the feature of most service end node and/or client node is informal feature in a cluster, sentence The fixed cluster is malice cluster;
Corresponding relationship existing for client node all in the malice cluster and service end node is restored, then to be wanted The malicious traffic stream of detection.
Second aspect, the disclosure provide a kind of unsupervised encryption malicious traffic stream detection device, comprising:
Data acquisition unit, for based on data characteristics collection needed for network flow acquisition;
Construction unit, for establishing the bipartite graph between client and server-side using the data characteristics collection of acquisition;
First cluster cell, for being clustered for the first time by figure cutting method to client and service end node;
Vectorization unit, for the client and service end node progress to larger connection subgraph in the first cluster Vectorization processing;
Cluster cell again clusters the data after vectorization using DBScan algorithm again;
Judging unit, for determining malicious traffic stream and node using the cluster result after the cluster again.
The third aspect, the disclosure provide a kind of electronic equipment, including processor and memory, the memory are stored with energy Enough computer program instructions executed by the processor when the processor executes the computer program instructions, realize the On the one hand any method and step.
Fourth aspect, the disclosure provide a kind of computer readable storage medium, are stored with computer program instructions, the meter Calculation machine program instruction realizes any method and step of first aspect when being called and being executed by processor.
Compared with prior art, the beneficial effect of the embodiment of the present disclosure is:
The disclosure utilizes the unsupervised learning model based on figure, can be the case where no priori knowledge is with mark sample set Encryption flow detection is directly carried out down, is separately won to obtain different types of race by carrying out two to figure, is converted small race for big nation, then divide It is not tested by the feature of flow and identifies malicious traffic stream, method is simple to operation, can efficiently detect encryption Malicious traffic stream.
Detailed description of the invention
In order to illustrate more clearly of the embodiment of the present disclosure or technical solution in the prior art, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this public affairs The some embodiments opened for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the unsupervised encryption malicious traffic stream detection method flow diagram that the embodiment of the present disclosure provides;
Fig. 2 is the unsupervised encryption malicious traffic stream detection method bipartite graph schematic diagram that the embodiment of the present disclosure provides;
Fig. 3 is the structural schematic diagram for the unsupervised encryption malicious traffic stream detection device that the embodiment of the present disclosure provides;
Fig. 4 is the structural schematic diagram for the electronic equipment that the embodiment of the present disclosure provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present disclosure clearer, below in conjunction with the embodiment of the present disclosure In attached drawing, the technical solution in the embodiment of the present disclosure is clearly and completely described, it is clear that described embodiment is Disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in the disclosure, those of ordinary skill in the art Every other embodiment obtained without creative efforts belongs to the range of disclosure protection.
The term used in the embodiments of the present disclosure is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The disclosure.In the embodiment of the present disclosure and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning, " a variety of " generally comprise at least two, but not It excludes to include at least one situation.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though in the embodiments of the present disclosure may be using term first, second, third, etc. come description technique name Claim, but these technical names should not necessarily be limited by these terms.These terms are only used to distinguish technical name.For example, not taking off In the case where embodiment of the present disclosure range, the first signature verification can also be referred to as the second signature verification, similarly, the second school Sign test name can also be referred to as the first signature verification.
Depending on context, word as used in this " if ", " if " can be construed to " ... when " or " when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as Fruit detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection (statement Condition or event) when " or " in response to detection (condition or event of statement) ".
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Include, so that commodity or system including a series of elements not only include those elements, but also including not clear The other element listed, or further include for this commodity or the intrinsic element of system.In the feelings not limited more Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or system for including the element also There are other identical elements.
In addition, the step timing in following each method embodiments is only a kind of citing, rather than considered critical.
Referring to Fig. 1, in a first aspect, the disclosure provides a kind of unsupervised encryption malicious traffic stream detection method, including such as Lower step:
Step S101: based on data characteristics collection needed for network flow acquisition;
Optionally, the data characteristics collection includes:
Client encryption suite, the TLS extension of client support, server-side certificate.
Step S102: the bipartite graph between client and server-side is established using the data characteristics collection of acquisition;
As shown in Fig. 2, by the client metadata feature such as figure left node extracted in flow, certificate metadata feature Right side node is such as schemed, then any one stream can be expressed as the node of client features composition to certificate node Side.And due to not having side between client metadata, side there will not be between digital certificate.Therefore node is equivalent to be divided into Two classes form a bipartite graph.
Specific optional, the data characteristics collection using acquisition establishes two points between client and server-side Figure, comprising:
Any client node is selected at random, is connected it and is corresponded to associated service end node, forms the client node With the side of service end node;
All clients node and service end node are traversed, the data characteristics concentrates all client nodes and clothes Business end node all forms corresponding connection relationship;
Bipartite graph is established using the connection relationship that the client node and service end node are formed.
Step S103: client and service end node are clustered for the first time by figure cutting method;
Wherein, figure cutting method refers to, for the bipartite graph that step S102 is formed, can have multiple subgraphs, i.e., this is a little It is not no side between figure, does not have intersection between them, to form discrete race, can be divided into node according to this feature Different groups completes first cluster.
Specifically, described cluster client or service end node by figure cutting method for the first time, comprising:
The bipartite graph is subjected to subgraph cluster;
Each subgraph of absolutely not incidence relation is divided into different clusters, to be clustered for the first time.
Step S104: to the client and service end node progress vector in larger connection subgraph in the first cluster Change processing;
Optionally, the client of larger connection subgraph in the first cluster and service end node are carried out at vectorization Reason, comprising:
Select the connection subgraph that number of nodes is more in the first cluster;
Any one node from the subgraph randomly chooses a node as next section according to link relation Point forms the sequence that a length is t;
For each of sequence node, using skip-gram method, learn itself using other nodes around it Character representation, by the expression of each node from the OneHot of various dimensions coding dimensionality reduction at node diagnostic vector.
Specifically, vectorization processing needs specific two steps:
The first step is to establish sequence node using random walk, and the specific method is as follows:
Any one node from figure randomly chooses a node as next node, shape according to link relation At a sequence, defined nucleotide sequence length is t, then can form a client node and service the length that end node is alternately present For the sequence of t;
Each of figure node will carry out above-mentioned steps as start node can then generate if there is Q node Q length is t sequence.
Second step is the feature vector for obtaining each node using skip-gram method to the node in these sequences, tool Body method is as follows:
Its input is a string of sequence nodes, and the OneHot coded representation of each node: being directed to a node N, if His is ordered as n, then the initial vector corresponding to it is (0,0,0,0,0 ... ..., 1 ... ..., 0,0,0,0), i.e., in addition to n-th Position is 1, remaining is all 0 vector.Its output is the feature vector P of the more low dimensional of node, the length is p, general feelings P is far smaller than n under condition.
OneHot Feature Dimension Reduction is as follows at the process of P, for each nodes X, it is available it in different sequences Different context Y (1-k), it is right for each (X, Y), use back-propagation algorithm training one neural network, nerve net The input of network is the x of OneHot, and training label is the y of OneHot, and hidden layer is P, and training parameter is that (m is section for the matrix W of m*p Point sum).Since the value of only one position in x is 1, during backpropagation, the parameter of certain a line in W only will be updated Value, the parameter of this line are exactly that training obtains the low dimensional feature P of X.For each X, one can be obtained in aforementioned manners A low dimensional P feature vector.
Step S105: the data after vectorization are clustered again using DBScan algorithm;
The basic thought of DBScan algorithm is the distance between calculate node, is based on range estimation node similitude, will have There is the node of similitude to gather for one kind, to complete data clusters.
Optionally, the DBScan algorithm includes:
Calculate the distance between each node;
Based on node similitude described in the range estimation;
Node with similitude is gathered for one kind.
Step S106: malicious traffic stream and node are determined using the cluster result after the cluster again.
It can presence service end certificate node and client encryption suite minutiae in each cluster after cluster.Utilize card The domain name in CN field, SAN field in book node, it can be determined that whether the server-side of the certificate is regular website.If one Most certificate is informal website certificate in a cluster, it may be considered that the cluster is malice cluster.By client all in this cluster It holds relationship corresponding with existing for service segment to restore, is then malicious traffic stream to be detected.
Optionally, the cluster result using after the cluster again determines malicious traffic stream and node, comprising:
Malicious traffic stream is carried out using the feature for servicing end node and/or client node in each cluster after clustering again to sentence It is fixed;
If the feature of most service end node and/or client node is informal feature in a cluster, sentence The fixed cluster is malice cluster;
Corresponding relationship existing for client node all in the malice cluster and service end node is restored, then to be wanted The malicious traffic stream of detection.
The disclosure utilizes the unsupervised learning model based on figure, can be the case where no priori knowledge is with mark sample set Encryption flow detection is directly carried out down, is separately won to obtain different types of race by carrying out two to figure, is converted small race for big nation, then divide It is not tested by the feature of flow and identifies malicious traffic stream, method is simple to operation, can efficiently detect encryption Malicious traffic stream.
Embodiment 2
As shown in figure 3, second aspect, the disclosure provides a kind of unsupervised encryption malicious traffic stream detection device, comprising: number According to acquisition unit 301, construction unit 302, first cluster cell 303, vectorization unit 304 cluster cell 305 and sentences again Order member 306, specifically,
Data acquisition unit 301, for based on data characteristics collection needed for network flow acquisition;
Construction unit 302 establishes two points between client and server-side for the data characteristics collection using acquisition Figure;
First cluster cell 303, for being clustered for the first time by figure cutting method to client and service end node;
Vectorization unit 304, for the client and service end node to larger connection subgraph in the first cluster Carry out vectorization processing;
Cluster cell 305 again cluster the data after vectorization using DBScan algorithm again;
Judging unit 306, for determining malicious traffic stream and node using the cluster result after the cluster again.
Embodiment 3
The disclosure provides a kind of computer readable storage medium, is stored with computer program instructions, the computer program It instructs and realizes any method and step of first aspect when being called and being executed by processor.
The disclosure utilizes the unsupervised learning model based on figure, can be the case where no priori knowledge is with mark sample set Encryption flow detection is directly carried out down, is separately won to obtain different types of race by carrying out two to figure, is converted small race for big nation, then divide It is not tested by the feature of flow and identifies malicious traffic stream, method is simple to operation, can efficiently detect encryption Malicious traffic stream.
Embodiment 4
As shown in figure 4, the disclosure provides a kind of electronic equipment, including processor and memory, the memory are stored with The computer program instructions that can be executed by the processor when processor executes the computer program instructions, are realized Any method and step of first aspect.
Below with reference to Fig. 4, it illustrates the structural representations for the electronic equipment 400 for being suitable for being used to realize the embodiment of the present disclosure Figure.Terminal device in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting and connect Receive device, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle Carry navigation terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Electricity shown in Fig. 4 Sub- equipment is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 4, electronic equipment 400 may include processing unit (such as central processing unit, graphics processor etc.) 401, random access can be loaded into according to the program being stored in read-only memory (ROM) 402 or from storage device 408 Program in memory (RAM) 403 and execute various movements appropriate and processing.In RAM 403, it is also stored with electronic equipment Various programs and data needed for 400 operations.Processing unit 401, ROM 402 and RAM 403 pass through the phase each other of bus 404 Even.Input/output (I/O) interface 405 is also connected to bus 404.
In general, following device can connect to I/O interface 405: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 406 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 407 of dynamic device etc.;Storage device 408 including such as tape, hard disk etc.;And communication device 409.Communication device 409, which can permit electronic equipment 400, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 4 shows tool There is the electronic equipment 400 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 409, or from storage device 408 It is mounted, or is mounted from ROM 402.When the computer program is executed by processing unit 401, the embodiment of the present disclosure is executed Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity When sub- equipment executes, so that the electronic equipment: obtaining at least two internet protocol addresses;Send to Node evaluation equipment includes institute State the Node evaluation request of at least two internet protocol addresses, wherein the Node evaluation equipment is internet from described at least two In protocol address, chooses internet protocol address and return;Receive the internet protocol address that the Node evaluation equipment returns;Its In, the fringe node in acquired internet protocol address instruction content distributing network.
Alternatively, above-mentioned computer-readable medium carries one or more program, when said one or multiple programs When being executed by the electronic equipment, so that the electronic equipment: receiving the Node evaluation including at least two internet protocol addresses and request; From at least two internet protocol address, internet protocol address is chosen;Return to the internet protocol address selected;Wherein, The fringe node in internet protocol address instruction content distributing network received.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, the One acquiring unit is also described as " obtaining the unit of at least two internet protocol addresses ".

Claims (10)

1. a kind of unsupervised encryption malicious traffic stream detection method, which comprises the steps of:
Step S101: based on data characteristics collection needed for network flow acquisition;
Step S102: the bipartite graph between client and server-side is established using the data characteristics collection of acquisition;
Step S103: client and service end node are clustered for the first time by figure cutting method;
Step S104: client and service end node to larger connection subgraph in the first cluster carry out at vectorization Reason;
Step S105: the data after vectorization are clustered again using DBScan algorithm;
Step S106: malicious traffic stream and node are determined using the cluster result after the cluster again.
2. the method according to claim 1, wherein the data characteristics collection includes:
Client encryption suite, the TLS extension of client support, server-side certificate.
3. the method according to claim 1, wherein the data characteristics collection using acquisition establishes client Bipartite graph between end and server-side, comprising:
Any client node is selected at random, is connected it and is corresponded to associated service end node, forms the client node and clothes The side of business end node;
All clients node and service end node are traversed, the data characteristics concentrates all client nodes and server-side Node all forms corresponding connection relationship;
Bipartite graph is established using the connection relationship that the client node and service end node are formed.
4. according to the method described in claim 3, it is characterized in that, described to client or service end segment by figure cutting method Point is clustered for the first time, comprising:
The bipartite graph is subjected to subgraph cluster;
Each subgraph of absolutely not incidence relation is divided into different clusters, to be clustered for the first time.
5. according to the method described in claim 4, it is characterized in that, described to larger connection subgraph in the first cluster Client and service end node carry out vectorization processing, comprising:
Select the connection subgraph that number of nodes is more in the first cluster;
Any one node from the subgraph randomly chooses a node as next node, shape according to link relation The sequence for being t at a length;
The spy of itself is learnt using other nodes around it using skip-gram method for each of sequence node Sign indicates, by the expression of each node from the OneHot of various dimensions coding dimensionality reduction at node diagnostic vector.
6. according to the method described in claim 5, it is characterized in that, the DBScan algorithm includes:
Calculate the distance between each node;
Based on node similitude described in the range estimation;
Node with similitude is gathered for one kind.
7. according to the method described in claim 6, it is characterized in that, the cluster result using after the cluster again determines Malicious traffic stream and node, comprising:
Malicious traffic stream judgement is carried out using the feature for servicing end node and/or client node in each cluster after clustering again;
If the feature of most service end node and/or client node is informal feature in a cluster, determining should Cluster is malice cluster;
Corresponding relationship existing for client node all in the malice cluster and service end node is restored, then to be detected Malicious traffic stream.
8. a kind of unsupervised encryption malicious traffic stream detection device characterized by comprising
Data acquisition unit, for based on data characteristics collection needed for network flow acquisition;
Construction unit, for establishing the bipartite graph between client and server-side using the data characteristics collection of acquisition;
First cluster cell, for being clustered for the first time by figure cutting method to client and service end node;
Vectorization unit, for the client and service end node progress vector to larger connection subgraph in the first cluster Change processing;
Cluster cell again clusters the data after vectorization using DBScan algorithm again;
Judging unit, for determining malicious traffic stream and node using the cluster result after the cluster again.
9. a kind of electronic equipment, which is characterized in that including processor and memory, the memory is stored with can be by the place The computer program instructions that device executes are managed, when the processor executes the computer program instructions, realize that claim 1-7 appoints Method and step described in one.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer program instructions, the computer program Instruction realizes method and step as claimed in claim 1 to 7 when being called and being executed by processor.
CN201811635919.2A 2018-12-29 2018-12-29 Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium Active CN109495513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811635919.2A CN109495513B (en) 2018-12-29 2018-12-29 Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811635919.2A CN109495513B (en) 2018-12-29 2018-12-29 Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium

Publications (2)

Publication Number Publication Date
CN109495513A true CN109495513A (en) 2019-03-19
CN109495513B CN109495513B (en) 2021-06-01

Family

ID=65713294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811635919.2A Active CN109495513B (en) 2018-12-29 2018-12-29 Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium

Country Status (1)

Country Link
CN (1) CN109495513B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138745A (en) * 2019-04-23 2019-08-16 极客信安(北京)科技有限公司 Abnormal host detection method, device, equipment and medium based on data stream sequences
CN110958220A (en) * 2019-10-24 2020-04-03 中国科学院信息工程研究所 Network space security threat detection method and system based on heterogeneous graph embedding
CN112134829A (en) * 2019-06-25 2020-12-25 北京观成科技有限公司 Method and device for generating encrypted flow characteristic set
CN112217762A (en) * 2019-07-09 2021-01-12 北京观成科技有限公司 Malicious encrypted traffic identification method and device based on purpose
CN113746780A (en) * 2020-05-27 2021-12-03 极客信安(北京)科技有限公司 Abnormal host detection method, device, medium and equipment based on host image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750475A (en) * 2012-06-07 2012-10-24 中国电子科技集团公司第三十研究所 Detection method and system for cross comparison of malicious code of interior and exterior view based on virtual machine
CN105871832A (en) * 2016-03-29 2016-08-17 北京理工大学 Network application encrypted traffic recognition method and device based on protocol attributes
CN106101061A (en) * 2016-05-24 2016-11-09 北京奇虎科技有限公司 The automatic classification method of rogue program and device
WO2017202466A1 (en) * 2016-05-26 2017-11-30 Telefonaktiebolaget Lm Ericsson (Publ) Network application function registration
CN108776655A (en) * 2018-06-01 2018-11-09 北京玄科技有限公司 A kind of term vector training method and device having supervision
CN109104441A (en) * 2018-10-24 2018-12-28 上海交通大学 A kind of detection system and method for the encryption malicious traffic stream based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750475A (en) * 2012-06-07 2012-10-24 中国电子科技集团公司第三十研究所 Detection method and system for cross comparison of malicious code of interior and exterior view based on virtual machine
CN105871832A (en) * 2016-03-29 2016-08-17 北京理工大学 Network application encrypted traffic recognition method and device based on protocol attributes
CN106101061A (en) * 2016-05-24 2016-11-09 北京奇虎科技有限公司 The automatic classification method of rogue program and device
WO2017202466A1 (en) * 2016-05-26 2017-11-30 Telefonaktiebolaget Lm Ericsson (Publ) Network application function registration
CN108776655A (en) * 2018-06-01 2018-11-09 北京玄科技有限公司 A kind of term vector training method and device having supervision
CN109104441A (en) * 2018-10-24 2018-12-28 上海交通大学 A kind of detection system and method for the encryption malicious traffic stream based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
鲁刚: ""恶意流量特征提取综述"", 《信息网络安全》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138745A (en) * 2019-04-23 2019-08-16 极客信安(北京)科技有限公司 Abnormal host detection method, device, equipment and medium based on data stream sequences
CN110138745B (en) * 2019-04-23 2021-08-24 极客信安(北京)科技有限公司 Abnormal host detection method, device, equipment and medium based on data stream sequence
CN112134829A (en) * 2019-06-25 2020-12-25 北京观成科技有限公司 Method and device for generating encrypted flow characteristic set
CN112217762A (en) * 2019-07-09 2021-01-12 北京观成科技有限公司 Malicious encrypted traffic identification method and device based on purpose
CN112217762B (en) * 2019-07-09 2022-11-18 北京观成科技有限公司 Malicious encrypted traffic identification method and device based on purpose
CN110958220A (en) * 2019-10-24 2020-04-03 中国科学院信息工程研究所 Network space security threat detection method and system based on heterogeneous graph embedding
CN110958220B (en) * 2019-10-24 2020-12-29 中国科学院信息工程研究所 Network space security threat detection method and system based on heterogeneous graph embedding
CN113746780A (en) * 2020-05-27 2021-12-03 极客信安(北京)科技有限公司 Abnormal host detection method, device, medium and equipment based on host image

Also Published As

Publication number Publication date
CN109495513B (en) 2021-06-01

Similar Documents

Publication Publication Date Title
JP7002638B2 (en) Learning text data representation using random document embedding
CN109495513A (en) Unsupervised encryption malicious traffic stream detection method, device, equipment and medium
Sun et al. Near real-time twitter spam detection with machine learning techniques
US10547618B2 (en) Method and apparatus for setting access privilege, server and storage medium
Kuehnhausen et al. Trusting smartphone apps? To install or not to install, that is the question
US11188720B2 (en) Computing system including virtual agent bot providing semantic topic model-based response
CN110138745B (en) Abnormal host detection method, device, equipment and medium based on data stream sequence
CN109271418A (en) Suspicious clique's recognition methods, device, equipment and computer readable storage medium
CN110378474A (en) Fight sample generating method, device, electronic equipment and computer-readable medium
CN110909222B (en) User portrait establishing method and device based on clustering, medium and electronic equipment
WO2019200810A1 (en) User data authenticity analysis method and apparatus, storage medium and electronic device
CN108228887A (en) For generating the method and apparatus of information
CN111090615A (en) Method and device for analyzing and processing mixed assets, electronic equipment and storage medium
CN109495552A (en) Method and apparatus for updating clicking rate prediction model
CN110413742A (en) Duplicate checking method, apparatus, equipment and the storage medium of biographic information
CN112863683A (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN111198967A (en) User grouping method and device based on relational graph and electronic equipment
CN110443647A (en) Information distribution method and equipment
US10706148B2 (en) Spatial and temporal convolution networks for system calls based process monitoring
CN113962401A (en) Federal learning system, and feature selection method and device in federal learning system
CN108470126A (en) Data processing method, device and storage medium
CN112016792A (en) User resource quota determining method and device and electronic equipment
CN114840634B (en) Information storage method and device, electronic equipment and computer readable medium
TWI792923B (en) Computer-implemented method, computer system and computer program product for enhancing user verification in mobile devices using model based on user interaction history
CN108960312A (en) Method and apparatus for generating disaggregated model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211208

Address after: 610000 No. 1, floor 1, No. 109, hongdoushu street, Jinjiang District, Chengdu, Sichuan

Patentee after: Geek Xin'an (Chengdu) Technology Co.,Ltd.

Address before: 100080 room 61306, 3 / F, Beijing Friendship Hotel, 1 Zhongguancun South Street, Haidian District, Beijing

Patentee before: JIKE XIN'AN (BEIJING) TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right