CN107220540A

CN107220540A - Intrusion detection method based on intensified learning

Info

Publication number: CN107220540A
Application number: CN201710256845.0A
Authority: CN
Inventors: 张迎周; 尹秀; 陈星昊; 王星; 赵莲
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2017-04-19
Filing date: 2017-04-19
Publication date: 2017-09-29

Abstract

The invention discloses a kind of intrusion detection method based on intensified learning, including being modeled to intrusion detection environment, a markoff process is simulated in intrusion detection environment, among the classification and Detection environment that intensified learning model is used for intrusion detection, learn optimal classification policy by intensified learning, it regard classification accuracy rate as the reward functions in intensified learning model, set up the Bellman equations based on intrusion detection markoff process, optimal solution is calculated using the Policy evaluation algorithm awarded based on the γ accoumulation of discounts, normal behaviour and the threshold value of improper behavior in intrusion detection can be determined, be conducive to constantly so that normal behaviour sequence is supplemented, the final verification and measurement ratio for improving intrusion detection, reduce rate of false alarm and rate of failing to report, finally lift the performance of whole intruding detection system.

Description

Intrusion detection method based on intensified learning

Technical field

The invention belongs to Intrusion Detection Technique field, and in particular to a kind of intrusion detection method based on intensified learning.

Background technology

With the popularization of network and smart mobile phone, network security problem becomes increasingly conspicuous, and how to ensure the safety of network system Become a urgent problem to be solved.Intrusion Detection Technique by collect operating system, system program, application program and The information such as network traffics bag, find to run counter to security strategy in monitored system or network, or jeopardize the behavior of system safety, are to protect The effective means of barrier system and network security.Machine learning is with the automatic acquisition of knowledge and is produced as goal in research, is artificial One of key problem of intelligence.Machine learning has with many other subjects such as statistics, psychology, robotics to intersect.Its In, the cross-synthesis of the psychology of learning and machine learning, which have directly facilitated intensified learning and have been also known as to do, strengthens study or reinforcement function reason By the generation and development with algorithm.So-called intensified learning be it is a kind of using environmental feedback as input, it is special, adapt to environment Machine learning method, its main thought is to realize the excellent of decision-making with environmental interaction and trial and error, the feedback signal of Utilization assessment Change.This is also the Basic Ways of the mankind or animal learning in nature.

Intruding detection system is a kind of network safety system of Initiative Defense attack, by detection principle be divided into Misuse detection and Abnormality detection.Misuse detection is built upon on the basis of known attack, and the unknown attack to system lacks adaptivity, rate of failing to report It is very high.Abnormality detection is then built upon on the normal behaviour model of network and system, it is intended that it is desirable that reduction intrusion detection Wrong report and rate of failing to report.Method based on abnormality detection is a lot, have statistical analysis, Bayesian network, neutral net, data mining, Genetic algorithm etc., where the shoe pinches are to be difficult to determine threshold values normally with exception, and rate of false alarm and rate of failing to report are also higher.

The content of the invention

The present invention proposes a kind of intrusion detection method based on intensified learning, it is therefore an objective to improve the verification and measurement ratio of intrusion detection, Rate of false alarm and rate of failing to report are reduced, the performance of whole intruding detection system is finally lifted.

A kind of intrusion detection method based on intensified learning, comprises the following steps：

Step 1：Set up four key elements of the intensified learning of intrusion detection：Environment, learner, Reward Program, strategy；The ring Border is in intruding detection system, and the learner is intrusion detection classification agent, analyzes intrusion detection relevant information, including normal State and abnormal condition, the Reward Program are intrusion detection verification and measurement ratio i.e. classification accuracy rates, and the strategy is intrusion detection Optimal detection side；

Step 2：Set up the markoff process based on intrusion detection：One is turned to by the classification agent of intrusion detection is abstract Learner, its state is one Ma Erke of logical relation composition of normal condition or abnormality, normal condition and abnormality Husband's process, i.e.,：The state of subsequent time is only relevant with the state at this moment, and all unrelated with the state at conventional any moment；

Step 3：Learn the optimal classification strategy of intrusion detection：Environment is modeled, simulate it is identical with environment or Approximate situation, obtains the Bellman equations of the recursive form of the intensified learning based on intrusion detection, the equation on this basis Including two functions：" state value function " and " state-action value function ", represents to specify in " state " and specify " shape respectively Accumulation award in state-action ", accumulation reward functions use γ accumulation of discounts to award；

Step 4：The solution of optimal solution is carried out to the value function of intrusion detection strategy：Determine that the Markov of intrusion detection is determined Plan process four-tuple, accumulation award parameter γ, setting convergence threshold, are next based on the Iteration algorithm of γ accumulation of discounts award, Obtain the optimum detection strategy of intrusion detection.

The step 1 is concretely comprised the following steps：

Step 1.1, the study key element according to intensified learning, set up the academic environment of intensified learning：In intruding detection system It is interior；The learner of intensified learning is set：Intrusion detection classification agent；

Step 1.2, the Reward Program R for defining intensified learning：Intrusion detection verification and measurement ratio is classification accuracy rate；Definition strategy S： The optimum detection methodology of intrusion detection；

Step 1.3, the action x, motion space X for defining intensified learning learner：Analyze various network logs, host information Deng intrusion detection relevant information；Definition status x, state space X：It is divided into two kinds of normal condition and abnormality.

Markov decision process four-tuple described in step 4 is<X,A,P,R>；Wherein P:X × A × X → R specifies shape State transition probability；R:X × A × X → R specifies award, and P is potential transfer function so that environment presses certain from current state Probability is transferred to another state.

The present invention simulates a Markov mistake by being modeled to intrusion detection environment in intrusion detection environment Journey, learns optimal classification policy by intensified learning, it may be determined that normal behaviour and improper behavior in intrusion detection Threshold value, is conducive to constantly so that normal behaviour sequence is supplemented, the verification and measurement ratio of raising intrusion detection reduces rate of false alarm and leakage Report rate, finally lifts the performance of whole intruding detection system.

Brief description of the drawings

Fig. 1 is the structure flow chart of the intensified learning environment based on intrusion detection；

Fig. 2 is the markoff process flow chart for defining intruding detection system；

Fig. 3 draws intrusion detection optimal detection strategic process figure to solve Bellman equations.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Step 1)：Four key elements of the intensified learning of an intrusion detection are set up, i.e.,：Environment, learner, Reward Program, plan Slightly.First, the environment of study makes in intruding detection system, and learner is intrusion detection classification agent, and Reward Program is invasion Detect that verification and measurement ratio is classification accuracy rate, strategy is the optimal detection side of intrusion detection.Secondly, the execution of learner, which is acted, to be The intrusion detection relevant information such as various network logs, host information is analyzed, state is divided into two kinds of normal condition and abnormality.

Step 1.1, intensified learning emphasizes how to be taken action based on environment, and to obtain maximized antedated profit, essence is In order to solve " decision-making " problem, that is, learn automatic and carry out decision-making, be a kind of mode of learning of constantly " trial and error ".Therefore will be strong Chemistry, which is practised, to be applied in intrusion detection, is primarily to set up the study key element of an intensified learning, that is, is set up intensified learning Practise environment：In intruding detection system；Define the learner of intensified learning：Intrusion detection classification agent；

Step 1.2, machine to do is to learn one " strategy " by being constantly trying in the environment, according to this plan Slightly, action in this condition with regard to that can know to perform；Definition strategy S is the optimum detection methodology of intrusion detection；Strategy has Two kinds of method for expressing：One kind is that strategy is expressed as into function π:X → A, deterministic policy commonly uses this expression；Another is probability Represent π:X × A → R, randomness strategy commonly uses this expression, and (x, a) acts a probability to π for selection under state x, necessary here There is ∑_aπ (x, a)=1；The quality of strategy depends on performing the accumulation award obtained after this strategy for a long time, appoints in intensified learning In business, the destination of study, which seeks to find, can make long-term accumulated award maximized strategy, and long-term accumulated award has a variety of calculating Mode, conventional has " T step accumulations award " and " award of γ accumulation of discounts "；Define intensified learning award Reward Program R be into Invade detection verification and measurement ratio i.e. classification accuracy rate；

Step 1.3, the final award of intensified learning task be multistep action after just it is observed that, it is highly desirable to pass through Attempt to find the result of each action generation, tell which action machine should do without training data；Define extensive chemical The action x of learner is practised, motion space X is analyzes the intrusion detection relevant informations such as various network logs, host information, first, Trapping module is collected and filtering volume of data bag from some key nodes on the different main frames and the network segment of data network system, Secondly these packets are sent to parsing and pretreatment module by the feature extraction of packet header and coded treatment, formed specific The sorting code number sequence (as being based on TCP port number scope and flag bit, UDP port number scope, icmp packet type etc.) of form, Classification based training and study are carried out to the sequence of these different agreement types by the intruding detection system based on intensified learning, are allowed to Constantly healthy and strong normal behaviour profile, then detected by the real-time analysis of detection module, such as notes abnormalities, by response processing module Alarm and relevant treatment；Therefore the state x of intensified learning model can be defined, state space X is normal condition and abnormality Two kinds.

Step 2)：Set up a markoff process based on intrusion detection：First, by the classification agent of intrusion detection Abstract to turn to a learner Agent, its due state only has two kinds, i.e.,：Normal condition and abnormality.Normal condition with The logical relation of abnormality may be constructed a markoff process, i.e.,：The state of the state of subsequent time only with this moment It is relevant, and it is all unrelated with the state at conventional any moment：

Step 2.1, intensified learning task is typically the markoff process of a standard, and markoff process refers to next The state at moment only has the state at this moment relevant, without being influenceed by conventional any state；So having to inside machine Environment in simulate the markoff process of an intrusion detection；A study is turned to by the classification agent of intrusion detection is abstract Person, its due state only has two kinds, i.e.,：Normal condition and abnormality；The logical relation energy of normal condition and abnormality Enough constitute a markoff process；

Step 2.2, the Markov model based on intrusion detection is built according to the characteristics of intensified learning；Wherein P is potential Transfer function so that environment is transferred to another state from current state by certain probability；

Step 2.3, the Markov four-tuple of intruding detection system is set up<X,A,P,R>；Wherein P:X × A × X → R refers to State transition probability is determined；R:X × A × X → R specifies award.

Step 3)：Learn the optimal classification strategy of an intrusion detection；Intensified learning can be divided into " having model learning " Element in " exempting from model learning ", usual " having model learning " i.e. Markov four-tuple is entirely known, in this mould In the case of type is known, be conducive to more efficiently learning optimal policy；Therefore the intensified learning based on intrusion detection is defined Process, which is one, has model learning, i.e. machine to be modeled to environment, can be drawn up in machine internal mode it is identical with environment or Approximate situation.The Bellman side of the recursive form of an intensified learning based on intrusion detection can be obtained on this basis Journey, the equation includes two functions：" state value function " and " state-action value function ", respectively represent specify " state " on And the accumulative award in specified " state-action ".Accumulation reward functions use γ accumulation of discounts to award：

Step 3.1, the corresponding markov decision process four-tuple E=of intrusion detection task<X,A,P,R>Be, it is known that Such situation is referred to as " known to model ", i.e., machine to environment carried out modeling can be drawn up in machine internal mode it is identical with environment Or approximate situation, it is referred to as " having model learning " in the environment learning of known models, now, for free position x, x` and dynamic Make a, execution action a is transferred to x` shape probability of states under x stateIt is known, the award that the transfer is brought It is also known, based on this, it is necessary to assume that state space X and motion space A is limited.

Step 3.2, when known to model, the expectation accumulation award that the strategy is brought can be estimated to any strategy π, V is made^π (x) represent to award using the tactful π accumulations brought from state x；Function Q^π(x, a) is represented from state x, is performed dynamic Make to reuse tactful π after a, definition " the accumulation award that state is brought, value function V () is defined here " defines " state-action Value function Q () ", represents to specify in " state " and specify the accumulation award in " state-action " respectively, and this method uses γ Accumulation of discount is used as reward functions.

Step 3.3, the definition awarded by γ accumulation of discounts, stateful value function：

Wherein x₀Represent initial state, a₀Represent the first element taken in initial state, wherein γ (0≤γ≤1) generations The neighbouring award of table is more important than future award.Therefore stateful-function of movement：

Because the Markov property of markoff process, the i.e. state of system subsequent time are only by the state at current time Determine, independent of conventional any state, then value function has very simple recursive form, therefore is awarded for γ accumulation of discounts Have：

Therefore in the presence of tactful π, state a value function should meet Bellman equations：

Its unique solution is optimal value function.

Step 4)：The solution of optimal solution is carried out to the value function of intrusion detection strategy：First, selection is that " value iteration is calculated Method ", the algorithm is a kind of iterative calculation method based on stragetic innovation, being capable of preferably adjustable strategies, and not time-consuming.Its It is secondary, determine that the markov decision process four-tuple of intrusion detection, accumulation award parameter γ, setting convergence threshold.Finally, it is based on The Iteration algorithm of γ accumulation of discounts award, obtains the optimum detection strategy of intrusion detection：

Step 4.1, Policy evaluation is carried out, in the case where there is model learning, the expectation that certain strategy is brought can be estimated Accumulation award；State value function V^π(x) represent from state x, awarded using the tactful π accumulations brought；State-action letter Number Q^π(x a) represents to reuse the accumulation award that tactful π is brought after state x, execution action a；

Step 4.2, it is typically complicated to solve Bellman equations, and selection " Iteration algorithm " is used as Bellman equations The derivation algorithm of optimal solution, it is determined that accumulation award parameter γ, setting convergence threshold；

Step 4.3, the Iteration algorithm of γ accumulation of discounts award is based ultimately upon, normal behaviour is determined in intrusion detection and non- The threshold value of normal behaviour, and then obtain the optimum detection strategy π of intrusion detection^*。

Claims

1. a kind of intrusion detection method based on intensified learning, it is characterised in that：Comprise the following steps：

Step 1：Set up four key elements of the intensified learning of intrusion detection：Environment, learner, Reward Program, strategy；The environment exists In intruding detection system, the learner is intrusion detection classification agent, analyzes intrusion detection relevant information, including normal condition With abnormal condition, the Reward Program is intrusion detection verification and measurement ratio i.e. classification accuracy rate, the strategy be intrusion detection most Excellent detection side；

Step 2：Set up the markoff process based on intrusion detection：A study is turned to by the classification agent of intrusion detection is abstract Person, its state is logical relation one Markov mistake of composition of normal condition or abnormality, normal condition and abnormality Journey, i.e.,：The state of subsequent time is only relevant with the state at this moment, and all unrelated with the state at conventional any moment；

Step 3：Learn the optimal classification strategy of intrusion detection：Environment is modeled, simulated identical or approximate with environment Situation, the Bellman equations of the recursive form of the intensified learning based on intrusion detection are obtained on this basis, the equation includes Two functions：" state value function " and " state-action value function ", represent in specified " state " and specify respectively " state- Accumulation award in action ", accumulation reward functions use γ accumulation of discounts to award；

Step 4：The solution of optimal solution is carried out to the value function of intrusion detection strategy：Determine the Markovian decision mistake of intrusion detection Journey four-tuple, accumulation award parameter γ, setting convergence threshold, are next based on the Iteration algorithm of γ accumulation of discounts award, obtain The optimum detection strategy of intrusion detection.

2. the intrusion detection method according to claim 1 based on intensified learning, it is characterised in that：The tool of the step 1 Body step is：

Step 1.1, the study key element according to intensified learning, set up the academic environment of intensified learning：In intruding detection system；If The learner for putting intensified learning：Intrusion detection classification agent；

Step 1.2, the Reward Program R for defining intensified learning：Intrusion detection verification and measurement ratio is classification accuracy rate；Definition strategy S：Invasion The optimum detection methodology of detection；

Step 1.3, the action x, motion space X for defining intensified learning learner：Various network logs, host information etc. is analyzed to enter Invade detection relevant information；Definition status x, state space X：It is divided into two kinds of normal condition and abnormality.

3. the intrusion detection method according to claim 1 based on intensified learning, it is characterised in that：Horse described in step 4 Er Kefu decision process four-tuples are<X,A,P,R>；Wherein P:X × A × X → R specifies state transition probability；R:X×A×X → R specifies award, and P is potential transfer function so that environment is transferred to another state from current state by certain probability.