CN103746982B - A kind of http network condition code automatic generation method and its system - Google Patents

A kind of http network condition code automatic generation method and its system Download PDF

Info

Publication number
CN103746982B
CN103746982B CN201310745102.1A CN201310745102A CN103746982B CN 103746982 B CN103746982 B CN 103746982B CN 201310745102 A CN201310745102 A CN 201310745102A CN 103746982 B CN103746982 B CN 103746982B
Authority
CN
China
Prior art keywords
condition code
cluster
bag
http
fine granularity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310745102.1A
Other languages
Chinese (zh)
Other versions
CN103746982A (en
Inventor
李可
刘潮歌
崔翔
李丹
梁玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201310745102.1A priority Critical patent/CN103746982B/en
Publication of CN103746982A publication Critical patent/CN103746982A/en
Application granted granted Critical
Publication of CN103746982B publication Critical patent/CN103746982B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of http network condition code automatic generation method, the method includes:Bag condition code generation step, URI condition codes generation step and http network condition code total collection generation step, bag condition code generation step is the characteristic statisticses and bag content extracted for the question-response bag of multiple network samples, coarseness cluster set is generated by secondary cluster, and then secondary cluster generates fine granularity cluster set on the basis of coarseness cluster set, and the question-response bag condition code set of network sample is generated by fine granularity cluster setURI condition code generation steps are that, for the flow that an independent class is divided into network sample, the supplement for carrying out URI paths and parameter attribute code is extracted, and generates the condition code set of URIEventually through question-response bag condition code setWith the condition code set of URIMerge generation condition code total collection Tall

Description

A kind of http network condition code automatic generation method and its system
Technical field
The present invention relates to network safety filed technology, more particularly to a kind of condition code generation side of unknown HTTP Botnets Method, more specifically, being a kind of http network condition code automatic generation method and its system.
Background technology
The related event of network security frequently occurs in recent years, and network security has gone up the heat for becoming national strategy aspect Point subject under discussion.However, due to netizen general lack of awareness of safety, computer operating system and application software comprising various leaks etc. because Element, increasing computer has quietly become " broiler chicken " in Botnet, become other people be engaged in steal privacy, attack Internet resources, the chess piece of the illegal activity such as illegitimately make exorbitant profits.
Botnet (Botnet) be it is a kind of " built by invading some non-partner user terminals in cyberspace, can By the universal computing platform of attacker's remote control ".Wherein, " non-cooperation " refers to that the user terminal invaded is not perceived;" attack The person of hitting " refers to the effector (Botmaster) for having manipulation power to the Botnet for being formed;" remote control " refers to attack Person can one-to-manyly control non-partner user by order with control (command and control, be abbreviated as C&C) channel Terminal.One controlled victim user terminal turns into a node of Botnet, can be referred to as " zombie host ", is commonly called as " broiler chicken ".The order of common Botnet mainly has IRC, HTTP, P2P three types with control protocol.Due to http protocol With good penetrability and centralized Control, increasing Botnet effector using http protocol as its communication with Control protocol.Effector controls substantial amounts of zombie host by Botnet, can obtain powerful distributed computation ability and Abundant information resources deposit.Attacker is easier to initiation distributed denial of service attack (DDoS), online identity and steals (Online Identity Theft), spam (Spam), click fraud (Click Fraud), bit coin are excavated Malicious acts such as (BitCoin Mining).Botnet is used as maximally effective general Attack Platform in attacker's hand, it has also become One of maximum security threat in current internet.
Why Botnet has so big threat, mainly there is following some reason:
Botnet is from a kind of new attack form derived from traditional worm and wooden horse.Worm have utilize security breaches Fast propagation diffusion advantage but have uncontrollability;Wooden horse has the ability to victim's remote control, but there is infection Speed is slow, management scale is small and the simple shortcoming of control mode.Botnet is combined with both advantages, compensate for both deficiencies And the product for being formed, harmfulness is stronger.
Botnet has high controllability and control logic and attacks the characteristic of phase separation." meat in Botnet Chicken " can be manipulated with control (command and control) channel by order by effector, can in a short time to certain Specific objective initiates attack in force (ddos attack etc.), the controllability with height.Additionally, the bot program in zombie host It is responsible for control logic, real strike mission is dynamically distributed on demand by effector.This method can be by complete threat entity point Some is segmented into, so as to both good flexibility can be provided for task distribution, the survivability of Botnet can be improved again.
Safety measure often lags behind the appearance of corresponding new Botnet.The detection method of feature based code is one Plant effective method.However, traditional characteristic code generation technique mostly just for worm, and these technologies cannot efficiently, High-quality condition code is automatically generated, therefore it cannot be effectively controlled at the Botnet popularization initial stage.
Detection method and system currently for Botnet have a lot, but the most existence time expense of these system detectios Greatly, using deployment difficulty the problems such as, it is impossible to spread truly;Traditional intruding detection system (IDS) although It is applied widely, can be used for effectively finding abnormal network behavior present in particular network, however, due to lacking correspondence corpse The condition code and respective rule of network, it is impossible to find potential new Botnet main frame in particular network in time.Current feature The extractive technique of code is primarily present following several problems:
Traditional characteristic code generating algorithm lacks the condition code generation side for HTTP Botnets mostly just for worm Method.The existing feature code generating method overwhelming majority is directed to the extraction of Worm Signature, due to Botnet order and control The difference of the feature of communication is made, these traditional feature code generating methods are not ideally suited for HTTP Botnet features The extraction of code.
Existing feature code generating method efficiency is low, time overhead is big.Traditional condition code generation is relied on mostly manually to be sentenced It is disconnected, it is impossible to accomplish extensive automation.Although there is a few peoples to propose the extraction method plan for Botnet condition code Attempt solving the problem, but the computing cost of these methods is very huge, it is impossible to large-scale promotion application.
Existing method generation condition code is of low quality, poor availability.Traditional feature code generating method is not directed to The order of HTTP Botnets is accounted for control communication feature, and the feature code generating method of use does not have specific aim, generation Condition code collective number is big, quality is relatively low.
The content of the invention
The technical problems to be solved by the invention are to overcome the existing system condition code generation time long and deployment difficulty Problem, it is proposed that a kind of http network condition code automatic generation method and its system.
It is up to above-mentioned purpose, the invention provides a kind of http network condition code automatic generation method, it is characterised in that institute The method of stating includes:
Bag condition code generation step:For characteristic statisticses and Bao Nei that the question-response bag of multiple network samples is extracted Hold, coarseness cluster set is generated by secondary cluster, and then secondary cluster generation is thin on the basis of the coarseness cluster set Granularity cluster set, the question-response bag condition code set of the network sample is generated by the fine granularity cluster set
URI condition code generation steps:For the flow that an independent class is divided into the network sample, URI roads are carried out The supplement of footpath and parameter attribute code is extracted, and generates the condition code set of the URI
Http network condition code total collection generation step:By the question-response bag condition code setWith the URI Condition code setMerge generation condition code total collection Tall
Above-mentioned http network condition code automatic generation method, it is characterised in that the bag condition code generation step, comprising:
Data extraction step:Data flow characteristics statistics and question-response bag content to the network sample are extracted;
Secondary sorting procedure:Data flow characteristics statistics and question-response bag content difference according to the network sample Secondary cluster is carried out, is generated on the basis of the coarseness cluster set, generate the fine granularity cluster set;
Question-response bag condition code generation step:According to the fine granularity cluster set, request bag and response bag are generated respectively Condition code set.
Above-mentioned http network condition code automatic generation method, it is characterised in that also included before the data extraction step:
White list filtration step:The flow of legitimate site is accessed in the filtering removal network sample.
Above-mentioned http network condition code automatic generation method, it is characterised in that the data extraction step, also includes:
Data content extraction step:Extract the content of the question-response bag of http session connection;
Coarseness clusters attributes extraction step:In units of the network sample, the four-dimension of the coarseness cluster is extracted Statistical value, including:Http traffic is total, transmission byte number per second, HTTP packets mean size and HTTP packets are total, Obtain coarseness cluster attribute;
Fine granularity clusters attributes extraction step:In units of each http session, the four-dimension of the fine granularity cluster is extracted Statistical value, including:Session request bag number, conversational response bag number, first request bag size, first response bag size, obtain Fine granularity clusters attribute;
Combined data collection step:By the content of the question-response bag, the coarseness cluster attribute and the fine granularity Cluster attribute collects and obtains five-tuple data setThe form of the five-tuple is:<Sample id, session id, in question-response bag Hold, coarseness cluster attribute, fine granularity cluster attribute>.
Above-mentioned http network condition code automatic generation method, it is characterised in that the secondary sorting procedure, also includes:
Coarseness sorting procedure:To the five-tuple data setIt is automatic that coarseness cluster attribute is gathered Class, obtains coarseness cluster set C, if the coarseness cluster set C is pertaining only to a network sample, performs described URI condition code generation steps;
Fine granularity sorting procedure:Based on the coarseness cluster set C, to each ci(ci∈ C) in all sessions, It is automatic to be clustered according to fine granularity cluster attribute, obtain fine granularity cluster set C ' (C ' ∈ Ci);
Sample coverage judges step:C is clustered if there is fine granularityi′(ci' ∈ C ') in all sessions from k Sample, the numerical value of k is more than 1, less than or equal to the network number of samples, then it is assumed that the fine granularity is clustered successfully, is otherwise performed The URI condition codes generation step.
Above-mentioned http network condition code automatic generation method, it is characterised in that the question-response bag condition code generation step Suddenly, also include:
HTTP condition code set generation steps:C is clustered to described each fine granularityi′(ci' ∈ C ') in all session connections The condition code for making requests on bag and response bag respectively is generated, and token characteristics code, final each fine granularity are calculated automatically from successively Cluster ci' condition code and a condition code for response bag of request bag are obtained respectively, form HTTP condition code set W;
Condition code filtration step:Filtering screening is carried out to the HTTP condition codes set W, the underproof feature is removed Code, merges the described document information for repeating, and obtains the question-response bag condition code set
Automatically generated using the network characterization present invention also offers a kind of http network condition code automatic creation system Method, it is characterised in that the system includes:
Bag condition code generation module:For characteristic statisticses and bag that the question-response bag for multiple network samples is extracted Content, coarseness cluster set, and then the secondary cluster generation on the basis of the coarseness cluster set are generated by secondary cluster Fine granularity cluster set, the question-response bag condition code set of the network sample is generated by the fine granularity cluster set
URI condition code generation modules:For the flow that an independent class is divided into the network sample, URI roads are carried out The supplement of footpath and parameter attribute code is extracted, and generates the condition code set of the URI
Http network condition code total collection generation module:By the question-response bag condition code setWith the URI Condition code setMerge generation condition code total collection Tall
Above-mentioned http network condition code automatic creation system, it is characterised in that the bag condition code generation module, comprising:
White list filtering module:Filtering removal accesses the flow of legitimate site;
Data extraction module:Data flow characteristics statistics and question-response bag content to the network sample are extracted;
Secondary cluster module:Data flow characteristics statistics and question-response bag content difference according to the network sample Secondary cluster is carried out, is generated on the basis of the coarseness cluster set, generate the fine granularity cluster set;
Question-response bag condition code generation module:According to the fine granularity cluster set, request bag and response bag are generated respectively Condition code set.
Above-mentioned http network condition code automatic creation system, it is characterised in that the data extraction module, also includes:
Data content extraction module:Extract the content of the question-response bag of http session connection;
Coarseness clusters property extracting module:In units of the network sample, the four-dimension of the coarseness cluster is extracted Statistical value, including:Http traffic is total, transmission byte number per second, HTTP packets mean size and HTTP packets are total, Obtain coarseness cluster attribute;
Fine granularity clusters property extracting module:In units of each http session, the four-dimension of the fine granularity cluster is extracted Statistical value, including:Session request bag number, conversational response bag number, first request bag size, first response bag size, obtain Fine granularity clusters attribute;
Combined data collection module:By the content of the question-response bag, the coarseness cluster attribute and the fine granularity Cluster attribute collects and obtains five-tuple data setThe form of the five-tuple is:<Sample id, session id, question-response bag Content, coarseness cluster attribute, fine granularity cluster attribute>.
Above-mentioned http network condition code automatic creation system, it is characterised in that the secondary cluster module, also includes:
Coarseness cluster module:To the five-tuple data setIt is automatic that coarseness cluster attribute is gathered Class, obtains coarseness cluster set C, if the coarseness cluster set C is pertaining only to a network sample, by described URI condition codes generation module generates the URI condition codes;
Fine granularity cluster module:Based on the coarseness cluster set C, to each ci(ci∈ C) in all sessions, It is automatic to be clustered according to fine granularity cluster attribute, obtain fine granularity cluster set C ' (C ' ∈ Ci);
Sample coverage judge module:C is clustered if there is fine granularityi′(ci' ∈ C ') in all sessions from k Sample, the numerical value of k is more than 1, less than or equal to the network number of samples, then it is assumed that the fine granularity is clustered successfully, is otherwise passed through The URI condition codes generation module generates URI condition codes.
Above-mentioned http network condition code automatic creation system, it is characterised in that the question-response bag condition code generates mould Block, also includes:
HTTP condition code set generation modules:C is clustered to described each fine granularityi′(ci' ∈ C ') in all session connections The condition code for making requests on bag and response bag respectively is generated, and token characteristics code, final each fine granularity are calculated automatically from successively Cluster ci' condition code and a condition code for response bag of request bag are obtained respectively, form HTTP condition code set W;
Condition code filtering module:Filtering screening is carried out to the HTTP condition codes set W, the underproof feature is removed Code, merges the described document information for repeating, and obtains the question-response bag condition code set
Compared with prior art, the present invention for the order of HTTP Botnets with control communication data statistics similarity and Question-response includes the principle of most of Botnet characteristic informations, it is proposed that a kind of HTTP corpses based on question-response bag Network characterization code automatic generation method.Question-response bag and ASSOCIATE STATISTICS characteristic of the method to the http communication data of main frame Extracted, secondary cluster carried out to HTTP data by X-means clustering algorithms, using longest common subsequence algorithm and Characterization method based on URI carries out the generation of condition code.
The invention has the advantages that:
1st, the communication feature code of HTTP Botnets can automatically be extracted;
2nd, condition code formation efficiency is improve, the expense in time and space is shortened;
3rd, the robustness and adaptability of condition code generation system, high-quality characteristics code and the such as snort of generation are improve Coordinate etc. intruding detection system, it is possible to achieve the detection of corresponding Botnet on a large scale.
Brief description of the drawings
Fig. 1 is http network condition code automatic generation method schematic flow sheet of the present invention;
Fig. 2 is http network condition code automatic generation method detailed process schematic diagram of the present invention;
Fig. 3 is http network condition code automatic creation system structural representation of the present invention.
Wherein, reference:
1 bag condition code generation module 2URI condition code generation modules
3HTTP network characterizations code total collection generation module
The data extraction module of 11 white list filtering module 12
13 2 question-response bag condition code generation modules of cluster module 14
S1~S3, S11~S14, S121~S124, S131~S133, S141~S142:The execution of various embodiments of the present invention Step
Specific embodiment
Below in conjunction with the drawings and specific embodiments, the present invention will be described in detail, but not as a limitation of the invention.
The purpose of the present invention is that numerous HTTP Botnet samples are classified, and automatically produces corresponding condition code For detecting.Advantage of the invention is that:Not needing any priori can generate the communication feature code of Botnet, or even Condition code can be generated to the Botnet of Content of Communication encryption.
Application field of the invention:1. the detection to realize Botnet on a large scale proposes one kind and efficiently automatically generates The method of HTTP Botnet condition codes;2. in the research of Botnet, according to its network behavior to the corpse of different samples Network is classified and is automatically extracted condition code.
The present invention proposes a kind of http network condition code automatic generation method, based on question-response bag, can be accurately automatic Change the method for extracting HTTP Botnet condition codes.This method is based on the user's behaviors analysis of a large amount of Botnet samples, adopts With the question-response bag (first request and first response HTTP packets) in http session connection as condition code extracting object, Use for reference longest common subsequence algorithm (Longest Common Subsequence are abbreviated as LCS) automation, efficiently generate High-quality HTTP Botnets condition code.The present invention is former with the similitude of control communication data based on the order of HTTP Botnets Reason devises a set of condition code automatic creation system based on question-response bag.
As depicted in figs. 1 and 2, the network characterization code automatic generation method that the present invention is provided, specific steps include:
Bag condition code generation step S1:For characteristic statisticses and Bao Nei that the question-response bag of multiple network samples is extracted Hold, coarseness cluster set, and then the secondary cluster generation fine granularity on the basis of coarseness cluster set are generated by secondary cluster Cluster set, the question-response bag condition code set of network sample is generated by fine granularity cluster set
Condition code towards question-response bag is generated, and is found according to a large amount of statistics, and the order of Botnet communicates with control The connection duration it is short, valuable feature (information of zombie host, the binary system text of request in overwhelming majority communication Part name, strike order etc.) all concentrate on the question-response bag (request first and first response HTTP bags) of http session connection. Therefore, object is generated as condition code using the question-response bag of HTTP.The method can greatly reduce packet storage, compare Computing cost, can improve the efficiency of condition code generation.
Compared with the condition code generation technique (Polygraph, Autograph etc.) of main flow, the present invention is directed to HTTP corpses The communication feature of network proposes to question-response packet that not all HTTP packets are calculated, compared with conventional method The formation efficiency of condition code is the method increase, operation time and the dual expense of memory space is reduced.
The present invention takes efficient secondary cluster, in the present invention, using classical X-means algorithms, to sample data Stream statistics characteristic and the question-response bag content of session carry out coarseness and fine-grained secondary cluster respectively.It is poly- in coarseness In class, in units of sample, total http traffic, transmission byte number per second, HTTP packets mean size, HTTP numbers are chosen According to the four-dimensional cluster attribute that bag sum is clustered as coarseness, the cluster can condense together the similar sample of network behavior (it is assumed that they belong to same class Botnet);In fine granularity cluster, in units of the http session connection of sample, in coarse grain Fine-grained cluster is carried out to all of session connection in each class on the basis of degree cluster, session request bag number, session is chosen The four-dimensional cluster attribute that response bag number, first request bag size, first response bag size are clustered as fine granularity, fine granularity is gathered Class can be aggregated to similar packet together, generation high-quality characteristics code;The method of this secondary cluster is logical without understanding Convenient, effectively the similar packet of content can be condensed together in the case of letter content, reduced between mass data bag Cumbersome comparing is calculated.
Coarseness of the invention quickly can draw the similar packet of statistical nature with fine-grained secondary clustering method Divide in same cluster, improve the speed of condition code generation, this division methods do not need priori, be independent of in specific Hold, it is to avoid mass data bag contrasts brought time overhead between any two.
URI condition code generation steps S2:For the flow that an independent class is divided into network sample, URI paths are carried out And the supplement of parameter attribute code is extracted, and generates the condition code set of URI
URI condition code generation step S2 are frequently encountered a certain or several in numerous sample flow cluster process The flow for planting sample is individually divided in a class, in this case using a kind of means of supplementing out economy:Request to the sample Package the URI for beginning to be analyzed, extract the condition code of path therein and required parameter as the sample.It is so certain The robustness and adaptability of condition code extraction system are improve in degree.
Will for the sample data that single sample cluster, the failure of fine granularity cluster, generation question-response bag condition code fail URI condition code generation step S2 can be admitted to, packaged based on HTTP request the URI paths for beginning (with firstNumber for knot Bundle flag) and parameter (in URI submit to parameter name) condition code extraction:It is all of to the sample to ask in units of sample Ask bag to be checked, extract path and the parameter set of initial row.For example, initial row content is GET/weather/ getweather.aspxThe packet of t=1377511384901&cityno=HTTP/1.1, extract outbound path for/ Weather/getweather.aspx, parameter is t and cityno.Token characteristics code is designated as/weather/ getweather.aspx.*t.*cityno.The URI condition code set of these samples will eventually be obtained, is designated as
Present invention introduces URI paths and parameter attribute extract, efficiently solve single in traditional characteristic code extracting method The situation of sample clustering failure, improves the robustness and adaptability of system to a certain extent.
Http network condition code total collection generation step S3:By question-response bag condition code setWith the URI's Condition code setMerge generation condition code total collection Tall
Question-response bag condition code setWith URI condition code setMerging has obtained final condition code set Tall。 Meanwhile, in same coarseness cluster, and to possess that belong between the sample of public " representative fine granularity cluster " same class stiff Corpse network.
Wherein, bag condition code generation step S1, also includes:
White list filtration step S11:Filtering removal accesses the flow of legitimate site;
The HTTP data of Botnet sample are initially entered " white list filtering module ".Due to there is Botnet effector In order to resist detection, legitimate request data (for example accessing Google, Baidu) are mixed in order with control communication stream and is intended to interference Detection and the generation of condition code.Therefore, for the quality of not effect characteristicses code generation, according to the website ranking of third party authority (such as 500 before ALEX websites ranking) filters out the HTTP flows for accessing legitimate site, and the HTTP data after filtering are handed to " data extraction module " treatment.
Data extraction step S12:Data flow characteristics statistics and question-response bag content to network sample are extracted;
Secondary sorting procedure S13:Data flow characteristics statistics and question-response bag content according to network sample are carried out respectively Secondary cluster, on the basis of generation coarseness cluster set, generates fine granularity cluster set;
Question-response bag condition code generation step S14:According to fine granularity cluster set, request bag and response bag are generated respectively Condition code set.
Wherein, data extraction step S12, also includes:
Data content extraction step S121:Extract the content of the question-response bag of http session connection;
Coarseness cluster attributes extraction step S122:In units of network sample, the four-dimensional statistics of coarseness cluster is extracted Value, including:Http traffic is total, transmission byte number per second, HTTP packets mean size and HTTP packets are total, obtains Coarseness clusters attribute;
Fine granularity cluster attributes extraction step S123:In units of each http session, the four-dimension of fine granularity cluster is extracted Statistical value, including:Session request bag number, conversational response bag number, first request bag size, first response bag size, obtain Fine granularity clusters attribute;
Combined data collection step S124:By the content of question-response bag, coarseness cluster attribute and fine granularity cluster attribute Collect and obtain five-tuple data setThe form of five-tuple is:<Sample id, session id, question-response bag content, coarseness are gathered Generic attribute, fine granularity cluster attribute>.
In data extraction step S12, the HTTP data to each sample are carried out in data flow characteristics statistics and packet Hold and extract, be broadly divided into three parts:One, extract question-response bag (first request and the first response of http session connection HTTP packets) content;Two, in units of network sample, extract the four-dimensional statistical value of coarseness cluster, including HTTP data Stream is total, transmission byte number per second, HTTP packets mean size, HTTP packets sum;Three, in units of session connection, Extract fine granularity cluster four-dimensional statistical value, including session request bag number, conversational response bag number, first request bag size, First response bag size.Three parts concomitantly can be carried out simultaneously, finally give five-tuple data setIts form is<Sample This id, session id, question-response bag content, coarseness cluster attribute, fine granularity cluster attribute>:Wherein " sample id " is uniquely marked Show different Botnet samples (data source), this indicates the species for not representing Botnet, such as in same local Two main frames of A, B are controlled by same Botnet in net, and both sample id are different;Session id is used for unique sign sample number The session connection of certain HTTP in.Extraction finish after by five metadata setsIncoming secondary sorting procedure S13.
Wherein, secondary sorting procedure S13, also includes:
Coarseness sorting procedure S131:To five-tuple data setIt is automatic that coarseness cluster attribute is clustered, obtain To coarseness cluster set C, if coarseness cluster set C is pertaining only to a network sample, URI condition code generation steps are performed S2;
Fine granularity sorting procedure S132:Based on coarseness cluster set C, to each cluster ci(ci∈ C) in all meetings Words, are clustered according to fine granularity cluster attribute automatically, obtain fine granularity cluster set C ' (C ' ∈ Ci);
Sample coverage judges step S133:C is clustered if there is fine granularityi′(ci' ∈ C ') in all sessions source In k sample, the numerical value of k is more than 1, less than or equal to network number of samples, then it is assumed that fine granularity is clustered successfully, otherwise performs URI Condition code generation step S2.
First, to data setCarry out coarseness cluster, clustering algorithm X-means algorithms disclosed in, according to the four-dimension Coarseness property value (http traffic is total, transmission byte number per second, HTTP packets mean size, HTTP packets sum) Sample is clustered, coarseness cluster set C is obtained.The cluster that single sample will be only existed is deleted, its corresponding five yuan of number According to collectionPerform URI condition code generation steps S2.Then on the basis of coarseness clustering ci(ci∈ C) be Unit, all session connections of all samples in being clustered to each coarseness are according to four-dimensional fine granularity property value (session request Bag number, conversational response bag number, first request bag size, first response bag size) clustered, clustering algorithm is still X- means.New fine granularity cluster set C ' (C ' ∈ C will be produced in each coarseness clusteri).Check each fine granularity in C ' The session connection source situation of cluster, it is assumed that ci' ∈ C ', if ci' in session connection from least k different samples (k clusters c less than or equal to the coarsenessiMiddle number of samples, more than 1, less than or equal to network number of samples, concrete numerical value can be free Setting), then such fine granularity clusters ci' meet requirement.Corresponding sample has " representational cluster ";Otherwise, due to not having Have and cover enough samples, such fine granularity cluster is without representativeness.If certain coarseness class CiIn certain sample (or Person's multiple sample) in the absence of fine granularity cluster (not covering enough sample sizes) of any " representative ", it is right The fine granularity cluster failure of these samples, it is believed that the enough samples of and quantity similar to them are not found, by these samples This related data set is incoming " URI condition codes generation module ".Desired fine granularity will be met and cluster ci' perform question-response bag Condition code generation step S14.
Wherein, question-response bag condition code generation step S14, also includes:
The set of HTTP condition codes generates S141:C is clustered to each fine granularityi′(ci' ∈ C ') in all session connections difference The condition code for making requests on bag and response bag is generated, and token characteristics code, final each fine granularity cluster are calculated automatically from successively ci' condition code and a condition code for response bag of request bag are obtained respectively, form HTTP condition code set W;
C is clustered to each fine granularityi' in all session connections carry out condition code generation, according to question-response bag be divided into please Ask bag condition code generate and response bag condition code generate, using longest common subsequence algorithm (LCS) as condition code generation Algorithm, produces token characteristics code (shape such as t1.*t2.*t3.*t4, tiCommon character string is represented .* represents blank character, before and after representing Exist in the middle of common characters string and mismatch character string).The flow for comparing calculating is as follows:It is assumed that there are tetra- sessions of a, b, c, d connecting Connect, the request bag of a first and b first passes through LCS and is calculated token characteristics code t, t removes all of .* and is converted to text formatting again Be calculated token characteristics code s with the request bag of c, s is converted to text formatting and is finally calculated with the request bag content of d Obtain final request bag condition code w;The condition code of response bag is calculated similarly.By calculating each fine granularity cluster ci' will The condition code of request bag and the condition code of a bar response bag can be produced, these condition codes are carried out to collect arrangement, mark institute The sample id being related to, each coarseness cluster ciA condition code set W will be obtained.
Condition code filtration step S142:Filtering screening is carried out to HTTP condition code set W, underproof condition code is removed, Merge the condition code for repeating, obtain question-response bag condition code set
The condition code set W of the question-response bag to producing carries out corresponding filtering screening, first, in token characteristics code The common characters string t of length too short (for example length is less than 4)iTo deletion;Then the common characters for being included to token characteristics code String is filtered, and common, the HTTP header fields field and partial content that can be frequently appeared in legal data packet is filtered (such as HTTP/1.1, Cache-Control:No-cache etc.);The token characteristics code for repeating finally is carried out duplicate removal merging, is obtained The condition code set of final question-response bag Botnet is arrivedThe condition code of certain sample is there may be in filter process Because undesirable (too short or be) and all deleted, such sample is considered as generation question-response condition code Failure, is equally performed URI condition code generation steps S2.
Present invention employs automation generation condition code, and the condition code quality of generation is high, can be with the intrusion detection such as snort System is implemented in combination with the extensive detection to corresponding Botnet.
The present invention also provides a kind of http network condition code automatic creation system, can individually be deployed in a server or master In machine (such as in honey jar main frame), all HTTP data produced by Botnet sample are obtained;Or be deployed in the system The gateway location of network is specified, with the Botnet detecting system linkage on network boundary, detecting system background data base is read The Botnet HTTP data for being stored.
A kind of http network condition code automatic creation system, as shown in figure 3, including:Bag condition code generation module 1, URI is special Levy yard generation module 2 and http network condition code total collection generation module 3;
Bag condition code generation module 1:The characteristic statisticses that extract for the question-response bag for multiple network samples and Bag content, coarseness cluster set, and then the secondary cluster life on the basis of the coarseness cluster set are generated by secondary cluster Beading degree cluster set, the question-response bag condition code set of network sample is generated by fine granularity cluster set
URI condition codes generation module 2:For for the flow that an independent class is divided into the network sample, carrying out The supplement of URI paths and parameter attribute code is extracted, and generates the condition code set of URI
Http network condition code total collection generation module 3:By question-response bag condition code setWith the condition code of URI SetMerge generation condition code total collection Tall
Wherein, bag condition code generation module 1, comprising:
White list filtering module 11:Filtering removal accesses the flow of legitimate site;
Data extraction module 12:Data flow characteristics statistics and question-response bag content to network sample are extracted;
Secondary cluster module 13:Data flow characteristics statistics and question-response bag content according to network sample carry out two respectively Secondary cluster, on the basis of generation coarseness cluster set, generates fine granularity cluster set;
Question-response bag condition code generation module 14:According to fine granularity cluster set, request bag and response bag are generated respectively Condition code set.
Wherein, data extraction module 12, also include:
Data content extraction module:Extract the content of the question-response bag of http session connection;
Coarseness clusters property extracting module:In units of network sample, the four-dimensional statistical value of coarseness cluster, bag are extracted Include:Http traffic is total, transmission byte number per second, HTTP packets mean size and HTTP packets are total, obtains coarse grain Degree cluster attribute;
Fine granularity clusters property extracting module:In units of each http session, the four-dimensional statistics of fine granularity cluster is extracted Value, including:Session request bag number, conversational response bag number, first request bag size, first response bag size, obtain particulate Degree cluster attribute;
Combined data collection module:The content of question-response bag, coarseness cluster attribute and fine granularity cluster attribute are collected Obtain five-tuple data setThe form of five-tuple is:<Sample id, session id, question-response bag content, coarseness cluster category Property, fine granularity cluster attribute>.
Wherein, secondary cluster module 13, also includes:
Coarseness cluster module:To five-tuple data setIt is automatic that coarseness cluster attribute is clustered, obtain thick Granularity cluster set C, if coarseness cluster set C is pertaining only to a network sample, is generated by URI condition codes generation module URI condition codes;
Fine granularity cluster module:Based on coarseness cluster set C, to each cluster ci(ci∈ C) in all sessions, It is automatic to be clustered according to fine granularity cluster attribute, obtain fine granularity cluster set C ' (C ' ∈ Ci);
Sample coverage judge module:C is clustered if there is fine granularityi′(ci' ∈ C ') in all sessions from k Sample, the numerical value of k is more than 1, less than or equal to network number of samples, then it is assumed that fine granularity is clustered successfully, otherwise by URI condition codes Generation module generates URI condition codes.
Wherein, question-response bag condition code generation module 14, also includes:
HTTP condition code set generation modules:C is clustered to each fine granularityi′(ci' ∈ C ') in all session connections difference The condition code for making requests on bag and response bag is generated, and token characteristics code, final each fine granularity cluster are calculated automatically from successively ci' condition code and a condition code for response bag of request bag are obtained respectively, form HTTP condition code set W;
Condition code filtering module:Filtering screening is carried out to HTTP condition code set W, underproof condition code is removed, merged The condition code for repeating, obtains question-response bag condition code set
Certainly, the present invention can also have other various embodiments, ripe in the case of without departing substantially from spirit of the invention and its essence Know those skilled in the art and work as and various corresponding changes and deformation, but these corresponding changes and change can be made according to the present invention Shape should all belong to the protection domain of appended claims of the invention.

Claims (11)

1. a kind of http network condition code automatic generation method, it is characterised in that methods described includes:
Bag condition code generation step:Data flow characteristics statistics and question-response bag content to network sample are extracted, and are passed through Secondary cluster generation coarseness cluster set, and then the secondary cluster generation fine granularity cluster on the basis of the coarseness cluster set Collection, the question-response bag condition code set of the network sample is generated by the fine granularity cluster setThe question-response It is first request and first response HTTP packets to wrap, and the bag condition code generation step includes data extraction step:To described The data flow characteristics statistics and question-response bag content of network sample are extracted, and the data extraction step is included:
Coarseness clusters attributes extraction step:In units of the network sample, the four-dimensional statistical value of coarseness cluster, bag are extracted Include:Http traffic is total, transmission byte number per second, HTTP packets mean size and HTTP packets are total, obtains coarse grain Degree cluster attribute;
Fine granularity clusters attributes extraction step:In units of each http session, the four-dimensional statistical value of fine granularity cluster, bag are extracted Include:Session request bag number, conversational response bag number, first request bag size and first response bag size, obtain fine granularity and gather Generic attribute;
URI condition code generation steps:For the flow that an independent class is divided into the network sample, carry out URI paths and The supplement of parameter attribute code is extracted, and generates the condition code set of the URI
Http network condition code total collection generation step:By the question-response bag condition code setWith the spy of the URI Levy code collection conjunctionMerge generation condition code total collection Tall
2. http network condition code automatic generation method according to claim 1, it is characterised in that the bag condition code generation Step, comprising:
Secondary sorting procedure:Data flow characteristics statistics and the question-response bag content according to the network sample are carried out respectively Secondary cluster, on the basis of the coarseness cluster set is generated, generates the fine granularity cluster set;
Question-response bag condition code generation step:According to the fine granularity cluster set, the spy of request bag and response bag is generated respectively Levy code collection conjunction.
3. http network condition code automatic generation method according to claim 2, it is characterised in that the data extraction step Also include before:
White list filtration step:The flow of legitimate site is accessed in the filtering removal network sample.
4. http network condition code automatic generation method according to claim 2, it is characterised in that the data extract step Suddenly, also include:
Data content extraction step:Extract the question-response bag content of http session connection;
Combined data collection step:By the question-response bag content, coarseness cluster attribute and fine granularity cluster category Property collects and obtains five-tuple data setThe form of the five-tuple is:<Sample id, session id, question-response bag content, slightly Granularity clusters attribute, fine granularity cluster attribute>.
5. http network condition code automatic generation method according to claim 4, it is characterised in that the secondary cluster step Suddenly, also include:
Coarseness sorting procedure:To the five-tuple data setIt is automatic that coarseness cluster attribute is clustered, obtain Coarseness cluster set C, if the coarseness cluster set C is pertaining only to a network sample, performs the URI condition codes Generation step;
Fine granularity sorting procedure:Based on the coarseness cluster set C, c is clustered to each coarsenessiIn all sessions, from It is dynamic to be clustered according to fine granularity cluster attribute, obtain fine granularity cluster set C ', wherein C ' ∈ Ci, ci∈C;
Sample coverage judges step:C ' is clustered if there is fine granularityiIn all sessions derive from k sample, the numerical value of k More than 1, less than or equal to the network number of samples, then it is assumed that the fine granularity is clustered successfully, the URI condition codes are otherwise performed Generation step, wherein c 'i∈C′。
6. http network condition code automatic generation method according to claim 5, it is characterised in that the question-response Bao Te A yard generation step is levied, is also included:
HTTP condition code set generation steps:C ' is clustered to described each fine granularityiIn all session connections make requests on bag respectively Condition code with response bag is generated, and token characteristics code, final each fine granularity cluster c ' are calculated automatically from successivelyiObtain respectively One condition code and a condition code for response bag of request bag, form HTTP condition codes set W, wherein c 'i∈C′;
Condition code filtration step:Filtering screening is carried out to the HTTP condition codes set W, underproof described document information is removed, Merge the described document information for repeating, obtain the question-response bag condition code set
7. a kind of http network condition code automatic creation system, using the network characterization as any one of claim 1-6 from Dynamic generation method, it is characterised in that the system includes:
Bag condition code generation module:Data flow characteristics statistics and question-response bag content to network sample are extracted, and are passed through Secondary cluster generation coarseness cluster set, and then the secondary cluster generation fine granularity cluster on the basis of the coarseness cluster set Collection, the question-response bag condition code set of the network sample is generated by the fine granularity cluster setThe question-response It is first request and first response HTTP packets to wrap, and the bag condition code generation module includes data extraction module:To described The data flow characteristics statistics and question-response bag content of network sample are extracted, and the data extraction module is included:
Coarseness clusters property extracting module:In units of the network sample, the four-dimensional statistical value of coarseness cluster, bag are extracted Include:Http traffic is total, transmission byte number per second, HTTP packets mean size and HTTP packets are total, obtains coarse grain Degree cluster attribute;
Fine granularity clusters property extracting module:In units of each http session, the four-dimensional statistical value of fine granularity cluster, bag are extracted Include:Session request bag number, conversational response bag number, first request bag size and first response bag size, obtain fine granularity and gather Generic attribute;
URI condition code generation modules:For the flow that an independent class is divided into the network sample, carry out URI paths and The supplement of parameter attribute code is extracted, and generates the condition code set of the URI
Http network condition code total collection generation module:By the question-response bag condition code setWith the spy of the URI Levy code collection conjunctionMerge generation condition code total collection Tall
8. http network condition code automatic creation system according to claim 7, it is characterised in that the bag condition code generation Module, comprising:
White list filtering module:Filtering removal accesses the flow of legitimate site;
Secondary cluster module:Data flow characteristics statistics and the question-response bag content according to the network sample are carried out respectively Secondary cluster, on the basis of the coarseness cluster set is generated, generates the fine granularity cluster set;
Question-response bag condition code generation module:According to the fine granularity cluster set, the spy of request bag and response bag is generated respectively Levy code collection conjunction.
9. http network condition code automatic creation system according to claim 8, it is characterised in that the data extract mould Block, also includes:
Data content extraction module:Extract the question-response bag content of http session connection;
Combined data collection module:By the question-response bag content, coarseness cluster attribute and fine granularity cluster category Property collects and obtains five-tuple data setThe form of the five-tuple is:<Sample id, session id, question-response bag content, slightly Granularity clusters attribute, fine granularity cluster attribute>.
10. http network condition code automatic creation system according to claim 9, it is characterised in that the secondary cluster mould Block, also includes:
Coarseness cluster module:To the five-tuple data setIt is automatic that coarseness cluster attribute is clustered, obtain Coarseness cluster set C, if the coarseness cluster set C is pertaining only to a network sample, by the URI condition codes Generation module generates the URI condition codes;
Fine granularity cluster module:Based on the coarseness cluster set C, c is clustered to each coarsenessiIn all sessions, from It is dynamic to be clustered according to fine granularity cluster attribute, obtain fine granularity cluster set C ', wherein C ' ∈ Ci, ci∈C;
Sample coverage judge module:C ' is clustered if there is fine granularityiIn all sessions derive from k sample, the numerical value of k More than 1, less than or equal to the network number of samples, then it is assumed that the fine granularity is clustered successfully, the URI condition codes are otherwise performed Generation step, wherein c 'i∈C′。
11. http network condition code automatic creation systems according to claim 10, it is characterised in that the question-response bag Condition code generation module, also includes:
HTTP condition code set generation modules:C ' is clustered to described each fine granularityiIn all session connections make requests on bag respectively Condition code with response bag is generated, and token characteristics code, final each fine granularity cluster c ' are calculated automatically from successivelyiObtain respectively One condition code and a condition code for response bag of request bag, form HTTP condition codes set W, wherein c 'i∈C′;
Condition code filtering module:Filtering screening is carried out to the HTTP condition codes set W, underproof described document information is removed, Merge the described document information for repeating, obtain the question-response bag condition code set
CN201310745102.1A 2013-12-30 2013-12-30 A kind of http network condition code automatic generation method and its system Expired - Fee Related CN103746982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310745102.1A CN103746982B (en) 2013-12-30 2013-12-30 A kind of http network condition code automatic generation method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310745102.1A CN103746982B (en) 2013-12-30 2013-12-30 A kind of http network condition code automatic generation method and its system

Publications (2)

Publication Number Publication Date
CN103746982A CN103746982A (en) 2014-04-23
CN103746982B true CN103746982B (en) 2017-05-31

Family

ID=50503969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310745102.1A Expired - Fee Related CN103746982B (en) 2013-12-30 2013-12-30 A kind of http network condition code automatic generation method and its system

Country Status (1)

Country Link
CN (1) CN103746982B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580216B (en) 2015-01-09 2017-10-03 北京京东尚科信息技术有限公司 A kind of system and method limited access request
CN105099834B (en) * 2015-09-30 2018-11-13 北京华青融天技术有限责任公司 A kind of method and apparatus of user-defined feature code
CN105978897B (en) * 2016-06-28 2019-05-07 南京南瑞继保电气有限公司 A kind of detection method of electric power secondary system Botnet
CN107222511B (en) * 2017-07-25 2021-08-13 深信服科技股份有限公司 Malicious software detection method and device, computer device and readable storage medium
CN107592312B (en) * 2017-09-18 2021-04-30 济南互信软件有限公司 Malicious software detection method based on network flow
CN109474452B (en) * 2017-12-25 2021-09-28 北京安天网络安全技术有限公司 Method, system and storage medium for automatically identifying B/S botnet background
CN108287905B (en) * 2018-01-26 2020-04-21 华南理工大学 Method for extracting and storing network flow characteristics
CN108897990B (en) * 2018-06-06 2021-10-29 东北大学 Interactive feature parallel selection method for large-scale high-dimensional sequence data
CN110472031A (en) * 2019-08-13 2019-11-19 北京知道创宇信息技术股份有限公司 A kind of regular expression preparation method, device, electronic equipment and storage medium
CN111182002A (en) * 2020-02-19 2020-05-19 北京亚鸿世纪科技发展有限公司 Zombie network detection device based on HTTP (hyper text transport protocol) first question-answer packet clustering analysis
CN113381996B (en) * 2021-06-08 2023-04-28 中电福富信息科技有限公司 C & C communication attack detection method based on machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102333313A (en) * 2011-10-18 2012-01-25 中国科学院计算技术研究所 Feature code generation method and detection method of mobile botnet
CN103297433A (en) * 2013-05-29 2013-09-11 中国科学院计算技术研究所 HTTP botnet detection method and system based on net data stream
US8555388B1 (en) * 2011-05-24 2013-10-08 Palo Alto Networks, Inc. Heuristic botnet detection
US8561188B1 (en) * 2011-09-30 2013-10-15 Trend Micro, Inc. Command and control channel detection with query string signature
CN103457909A (en) * 2012-05-29 2013-12-18 中国移动通信集团湖南有限公司 Botnet detection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101010302B1 (en) * 2008-12-24 2011-01-25 한국인터넷진흥원 Security management system and method of irc and http botnet

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8555388B1 (en) * 2011-05-24 2013-10-08 Palo Alto Networks, Inc. Heuristic botnet detection
US8561188B1 (en) * 2011-09-30 2013-10-15 Trend Micro, Inc. Command and control channel detection with query string signature
CN102333313A (en) * 2011-10-18 2012-01-25 中国科学院计算技术研究所 Feature code generation method and detection method of mobile botnet
CN103457909A (en) * 2012-05-29 2013-12-18 中国移动通信集团湖南有限公司 Botnet detection method and device
CN103297433A (en) * 2013-05-29 2013-09-11 中国科学院计算技术研究所 HTTP botnet detection method and system based on net data stream

Also Published As

Publication number Publication date
CN103746982A (en) 2014-04-23

Similar Documents

Publication Publication Date Title
CN103746982B (en) A kind of http network condition code automatic generation method and its system
CN108616534B (en) Method and system for preventing DDoS (distributed denial of service) attack of Internet of things equipment based on block chain
CN111817982B (en) Encrypted flow identification method for category imbalance
CN105681250B (en) A kind of Botnet distribution real-time detection method and system
CN103297433B (en) The HTTP Botnet detection method of data flow Network Based and system
CN103457909B (en) A kind of Botnet detection method and device
CN111988285A (en) Network attack tracing method based on behavior portrait
CN107370752B (en) Efficient remote control Trojan detection method
Jiang et al. ALDD: a hybrid traffic-user behavior detection method for application layer DDoS
CN114866485B (en) Network traffic classification method and classification system based on aggregation entropy
CN112788064B (en) Encryption network abnormal flow detection method based on knowledge graph
CN107276978A (en) A kind of Anonymizing networks of Intrusion Detection based on host fingerprint hide service source tracing method
CN105207997B (en) A kind of message forwarding method and system of attack protection
CN115134250A (en) Network attack source tracing evidence obtaining method
Songma et al. Classification via k-means clustering and distance-based outlier detection
CN104021348B (en) Real-time detection method and system of dormant P2P (Peer to Peer) programs
CN113221113B (en) Distributed machine learning and block chain-based internet of things DDoS detection and defense method, detection device and storage medium
CN114598499A (en) Network risk behavior analysis method combined with business application
TWI596498B (en) FedMR-based botnet reconnaissance method
Niu et al. Using XGBoost to discover infected hosts based on HTTP traffic
CN109190408B (en) Data information security processing method and system
Qin et al. MUCM: multilevel user cluster mining based on behavior profiles for network monitoring
Yang et al. [Retracted] Computer User Behavior Anomaly Detection Based on K‐Means Algorithm
CN113032787B (en) System vulnerability detection method and device
Yang et al. Botnet detection based on machine learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170531

Termination date: 20191230

CF01 Termination of patent right due to non-payment of annual fee