CN1333552C - Detecting system and method for user behaviour abnormal based on machine study - Google Patents

Detecting system and method for user behaviour abnormal based on machine study Download PDF

Info

Publication number
CN1333552C
CN1333552C CNB2005100569348A CN200510056934A CN1333552C CN 1333552 C CN1333552 C CN 1333552C CN B2005100569348 A CNB2005100569348 A CN B2005100569348A CN 200510056934 A CN200510056934 A CN 200510056934A CN 1333552 C CN1333552 C CN 1333552C
Authority
CN
China
Prior art keywords
shell
sequence
seq
user
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005100569348A
Other languages
Chinese (zh)
Other versions
CN1649311A (en
Inventor
田新广
隋进国
李学春
王辉柏
邹涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Capitek Co, Ltd.
Original Assignee
BEIJING SHOUXIN SCIENCE AND TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHOUXIN SCIENCE AND TECHNOLOGY Co Ltd filed Critical BEIJING SHOUXIN SCIENCE AND TECHNOLOGY Co Ltd
Priority to CNB2005100569348A priority Critical patent/CN1333552C/en
Publication of CN1649311A publication Critical patent/CN1649311A/en
Application granted granted Critical
Publication of CN1333552C publication Critical patent/CN1333552C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to a system and a method for detecting the abnormal behavior of users, which is based on machine learning. The system is composed of a control module, a data acquiring and preprocessing module, a study module, a sequence storing module, a detecting module and a detecting result outputting module. The system is configured on a server which needs monitoring; a shell order on a Unix platform is adopted as training data and auditing data; the normal behavior profile of a key legal user in a computer network system is set up by using a machine learning module after the data is preprocessed; whether the abnormal behavior, namely invasion, occurs, is identified by comparing the current behavior of the key legal user with the normal behavior profile in the process of detecting. If the current behavior of the user greatly deviates from the historical normal behavior profile of the user, abnormal behavior occurs, maybe the key legal user carries out un-authorized operation, or an outside invader uses the account of the key legal user illegally and carries out illegal operation. Accordingly, the attention of network administrators is attracted, and measures are taken to guarantee safety.

Description

Based on the unusual detection method of the user behavior of machine learning
Technical field
The present invention relates to a kind of unusual detection method of the user behavior based on machine learning that is used for computer network security, belong to the network information security technology field.
Background technology
In recent years, along with the continuous expansion of applications of computer network scope, all kinds of attacks and the destructive activity of network grown with each passing day, the harm that is caused is also increasing; Tens billion of dollars of global every year because of the destroyed economic loss that causes of the safety system of computer network reaches.At present, network security has become the key of national information industry development, also is simultaneously the important component part of country and national defense safety.Attack is detected and takes precautions against, ensure that the safety of computer system, network system and whole information infrastructure has become instant important topic.Intrusion detection is a kind of network information security technology that is used for detection computations machine network system intrusion behavior, it mainly passes through the operating position of state, behavior and the system of supervisory control comuter network system, come going beyond one's commission of detection system user to use and the misuse behavior, and the invador of system outside utilize the attack that safety defect carried out of system.Intruding detection system (IDS, Intrusion DetectionSystem) the various safety management abilities that comprise security audit, supervision, attack identification and responding ability that can the expanding system keeper, be considered to " fire compartment wall " second road safety gate afterwards, in network information security system, occupy critical role.
According to the source of Audit data and the object of being monitored, intruding detection system can be divided into main frame type, network-type and hybrid system.The Audit data that main frame type intruding detection system is used is mainly from record of the audit, system journal and the application log of operating system, and such system protection object is individual server normally.The information source of network-type intruding detection system then is the raw data packets on the network, and this type systematic is being undertaken the task of protecting a network segment usually.The mixed type intruding detection system can be analyzed simultaneously from the Audit data of server and the packet on the network, and system is made from multiple components, and generally adopts distributed frame.
Main frame type and network-type intruding detection system are in each tool advantage of different detection ranges, and there is complementarity in both.Main frame type intruding detection system major advantage has: (1) is insensitive to network traffics, generally can not influence the supervision to system action because of the increase of network traffics; That (2) detects is with strong points, detects fine size, and some activities of supervisory control system at an easy rate are for example at the activity of sensitive document, catalogue, program or port; (3) flexible configuration does not need extra hardware, can customize targetedly according to the actual conditions of protected system, can utilize the function of operating system itself and in conjunction with anomaly analysis, detect attack more accurately simultaneously; (4) attack of carrying out utilizing operating system leak or application software defective has good strick precaution effect; (5) can be used for encrypting or adopting the network environment of exchanging mechanism.The network-type intruding detection system generally is placed in the important network segment, and its advantage mainly contains: (1) is applicable to the attack that detects agreement Network Based; (2) with the operating system independent of server, applied widely, favorable expandability; (3) generally obtain data by the mode of network monitoring, thus very little to the performance impact of protected network, and do not need to change network configuration; And the network monitoring device is transparent to the user in the network, has reduced detection system itself and has suffered the possibility of intruder attack.
At present, Intrusion Detection Technique mainly is divided three classes: misuse detection, abnormality detection and mixing detect.Misuse detects by invasion (attack) behavior being analyzed and being represented to detect invasion (among the present invention, with " invasion " and " attack " as the synonym use); This method generally is that intrusion behavior is expressed as a kind of pattern or feature, and set up intrusion model (feature) storehouse according to known intrusion behavior and system defect, during detection monitored system or user's agenda pattern is mated with intrusion model, judge whether to exist invasion according to matching result.Misuse detects has very strong detectability to known invasion, and its shortcoming is that the pattern storehouse needs to bring in constant renewal in, and is difficult to detect unknown invasion.Abnormality detection is that system or user's normal behaviour (profile) is analyzed and represented, when monitored system or user's agenda and its normal behaviour when there is some difference, promptly thinking has invasion to exist.The advantage of abnormality detection is the knowledge that does not need too much relevant system defect, has stronger adaptability, can detect unknown invasion or emerging intrusion model.Mix to detect be will misuse detects and abnormality detection combines detection technique, have better detection performance usually.
Along with to the deepening continuously of computer network weakness and Attack Research, the application of misuse detection technique more and more widely, at present, commercial network-type intruding detection system adopts this technology mostly.The key that misuse detects is how intrusion behavior to be represented and upgraded, and the speed and the efficient that how to improve message capturing and pattern matching.Because new attack type and network hole constantly occur, the intrusion model in the actual misuse detection system (feature) storehouse often can not in time obtain replenishing and upgrading, and this is the main cause that causes system to fail to report.The abnormality detection technology has more application in main frame type intruding detection system, in the network-type intruding detection system then usually as replenishing of detecting of misuse the anomaly analysis of network traffics (for example to).The key problem of abnormality detection is how to represent system or user's normal behaviour (profile), and how system or user's agenda and its normal behaviour is compared.For abnormality detection, for the normal behaviour (to reduce false alarm probability) of representing system or user comprehensively, exactly, usually need be with a large amount of, comparatively complete training data to the detection model training.But, to compare with the misuse detection, abnormality detection has advantage in many aspects, and the ability that detects unknown attack is particularly arranged.As a kind of Intrusion Detection Technique that good development prospect is arranged, abnormality detection is more and more studied and is used.
The user behavior abnormality detection system that the present invention relates to is a kind of main frame type intruding detection system, and this system has adopted the abnormality detection technology based on machine learning.Machine learning is meant and utilizes machine (computer) learning knowledge and deal with problems, belongs to the intercrossing subject.The application study of machine learning mainly is various learning models of development and learning method, and makes up the learning system of the oriented mission with application-specific on this basis.
Universal model referring to machine learning system shown in Figure 1.A machine learning system mainly is made up of unit, knowledge base and performance element.Wherein unit is the core of system, and the information that it utilizes the external information source to provide is obtained knowledge and it is made improvement (for example reorganizing existing knowledge); The input of unit has two kinds: external environment information and execute the task after feedback information.Different learning system adopts different experience case representations, the simplest a kind of be that binary feature is represented, only whether the existence of some attribute of description object to be, the general input of using this binary feature of connectionist learning and genetic learning method.Another kind is that property value is represented, each attribute has one group of value of repelling mutually, can be redness, blueness and yellow etc. as the value of color attribute, and it is in the inductive learning method that the typical case that this property value is represented uses.Also have a kind of more complicated be that relation or structure represent that it describes the relation between two or more objects, this relation or structural information generally are to represent with forms such as predicate logic, semantic networks; The same two kinds of expressions are compared, and this expression has stronger expression ability, but have also brought suitable complexity for the matching process in the study simultaneously.Knowledge base is used for stored knowledge, and the knowledge that it is stored comprises domain knowledge (this knowledge generally is metastable), and the various new knowledges (this knowledge is time dependent in some cases) that obtain by study.The design of selecting which kind of knowledge to store learning system plays a part very key, and the system that has only stores concrete single experience example, and some system then stores the abstract popularization that obtains from these examples.If there are two kinds of differences again in the latter: represent knowledge with logic, discrete form, perhaps represent knowledge with numerical value, continuous form.Inductive learning and analytic learning often use logic, discrete representation, and connectionist learning then mainly uses representation numerical value, continuous.Performance element utilizes the knowledge in the knowledge base to execute the task, and the information after task is carried out feeds back to unit again as the further input of study, and this unit is to make learning system have practical use, can estimate the key component of learning method quality simultaneously again.
Machine learning techniques can be classified from different angles.According to the synthesized attribute of study, machine learning techniques can be divided into inductive learning, analytic learning, instance-based learning, connectionist learning etc.Inductive learning is under the given a series of known positive example and the condition of counter-example about certain notion, obtains the process that the generality of this notion is described by induction; The decision tree learning algorithm is the inductive learning algorithm of widely using at present, and typical decision tree learning algorithm has CLS algorithm, ID3 algorithm etc.Analytic learning is to utilize priori and enlarge the information that training examples provides by deduction; In analytic learning, the input of learning system also comprises the field theory except training examples and hypothesis space, and it is by can be used for explaining that the priori of training examples forms.Instance-based learning need store training examples, and extensive work is postponed till when analyzing new example; Generally, when the instance-based learning system runs into new example, it will analyze the relation of the example of new example and former storage, and in view of the above a target function value be composed to new example; The advantage of this technology is a property ground estimation objective function once on whole instance space not, but makes partial estimation at each new example to be analyzed; A deficiency of this technology is that the required amount of calculation of the new example of analysis may be bigger; So, suitably reduce the quantity of training examples, and index training examples effectively, the amount of calculation when analyzing new example to reduce is a major issue.Abnormality detection system provided by the present invention has adopted the technology of instance-based learning.
In an actual calculation machine network system, a plurality of validated users are arranged all generally; These validated users have different operating right (for example, the main activities of programmer in system is programming, and do not allow some operation in the executive system administrator right) usually; And different validated users has different behavioral characteristics and behavior rule.Safety for computer network system, all must the behavior of some crucial validated users in the system be monitored under many circumstances, to prevent that these crucial validated users from carrying out unauthorized operation, prevent that perhaps the account number that outside invasion person (disabled user) falsely uses these crucial validated users from carrying out illegal operation.
Summary of the invention
In view of this, the detection method that the purpose of this invention is to provide a kind of user behavior abnormality detection system based on machine learning.This method is to utilize machine learning model to set up the normal behaviour profile of (or one group) crucial validated user in the computer network system, and current behavior and its normal behaviour profile by more crucial validated user discerned abnormal behaviour in detection; If this user's current behavior has departed from its historical normal behaviour profile largely, promptly think taken place unusual: may be that crucial validated user has carried out unauthorized operation, or the account number that outside invasion person falsely uses crucial validated user have been carried out illegal operation.Though might not mean attack unusually, should cause safety officer's close attention at least.
In order to achieve the above object, the invention provides a kind of user behavior method for detecting abnormality based on machine learning, this method is to utilize based on the user behavior abnormality detection system of machine learning to realize, described system configuration is on the server of needs monitoring, adopt user interface shell-command on the Unix platform as Audit data, detect in the server in the behavior of user interface layer analysis user and whether invade; This system includes: control module, data are obtained and pretreatment module, study module, sequence memory module, detection module and testing result output module; It is characterized in that: described detection method comprises following operating procedure:
(1) system start-up;
(2) during the input of system wait instruction, the operating state and the detected parameters of system are set by control module, so that after input " starting working " instruction after this, automatically check the situation that is provided with of system by control module, enter two kinds of different operating states respectively:, carry out subsequent operation if system is set to learning state; If system is set to detected state, then redirect execution in step (6);
(3) under control module drove, data were obtained with pretreatment module and are written into original training data from predefined data-interface, and this original training data is carried out preliminary treatment, made it become the form of shell-command stream, exported it to study module again;
(4) study module utilizes pretreated shell-command stream training data to learn, and sets up the shell-command sequence library, and after depositing this sequence library in the sequence memory module, sends the message of " study finishes " to control module;
(5) after control module receives study module " study finishes " message, make system return step (2), wait for and import the new instruction that is provided with; Perhaps direct operating state with system transfers detected state to, carries out subsequent operation;
(6) under control module drives, data are obtained with pretreatment module and are written into the capable original Audit data of shell-command in real time from predefined data-interface, simultaneously this original Audit data is carried out real-time preliminary treatment, and export pretreated shell-command stream Audit data to detection module in real time;
(7) detection module carries out real-time analysis to this Audit data, generates testing result;
(8) the testing result output module shows testing result: the decision value curve, and abnormal behaviour carried out Realtime Alerts.
In the described step (2), if the operating state of system is set to learning state, the running parameter that needs to be provided with comprises:
Be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
If the operating state of system is set to detected state, the running parameter that needs to be provided with has two kinds, is respectively:
First kind of decision method, window length w, decision threshold λ; Or
Second kind of decision method, the number V of window length, V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V); Wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1).
Data are obtained with pretreatment module the preliminary treatment that original training data or original Audit data carry out are comprised the following steps: in described step (3) or the step (6)
(31) extract title, mark and the metacharacter that shell-command is ordered in capable;
(32) will include but not limited to that the information of filename, server name, catalogue, network address replaces with the identifier<n of consolidation form 〉, wherein n represents the number of filename, server name, catalogue or network address;
(33) on the time point that each shell session begins and finishes, insert identifier SOF and the EOF that represents starting and ending respectively;
(34) the shell-command symbol that will comprise the message identification symbol of the symbol of title, mark and metacharacter of order and filename, server name, catalogue, network address is arranged according to the appearance order in the shell session; And connect the order symbol of different shell sessions according to time sequencing, and in above-mentioned data, do not add timestamp, through after this preliminary treatment, original input data becomes shell-command stream in form: a string shell-command symbol of arranging in chronological order, and shell-command stream comprises the content of a plurality of shell sessions.
The learning manipulation that study module utilizes pretreated shell-command stream training data to carry out in the described step (4) comprises the following steps:
(41) obtain the training data that obtains passing through pretreated, as to represent this validated user normal behaviour with pretreatment module from data: R=(s 1, s 2..., s r), i.e. length be r shell-command stream, wherein s jJ the shell-command symbol that expression is arranged in chronological order;
(42) from control module, read learning parameter: be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
(43) by shell-command stream R generate W sequence length be respectively l (1), l (2) ..., the shell-command sequence flows of l (W): S 1, S 2..., S W, S wherein iBe that sequence length is the shell-command sequence flows of l (i): S i = ( Seq 1 i , Seq 2 i , . . . . . . , Seq r - l ( i ) + 1 i ) , In the formula Seq j i = ( s j , s j + 1 , . . . . . . , s j + l ( i ) - 1 ) It is S iIn j the shell-command sequence of arranging in chronological order, i is the natural number in the interval [1, W];
(44) calculate shell-command sequence flows S iIn each shell-command sequence at S iIn the frequency of occurrences, wherein i is the natural number in interval [1, W]; Be sequence of calculation stream S iIn each shell sequence at S iIn occurrence number divided by each sequence occurrence number sum in this sequence flows;
(45) read frequency threshold parameter in the control module: the frequency threshold η that is used to set up W shell-command sequence library 1, η 2..., η Wη wherein iBe employed frequency threshold when to set up sequence length be the shell-command sequence library of l (i), i is the natural number in interval [1, W];
(46) according to the frequency of occurrences of sequence, from W shell-command sequence flows S 1, S 2..., S WIn extract several shell-command sequences respectively as sample, set up W sequence library; Its concrete grammar step is:
If W is used to represent the set L={L (1) of the sequence library of this user's normal behaviour profile, L (2) ..., L (W) }, wherein L (i) expression is the sequence library that the sequence of l (i) is formed by length;
According to the order of natural number i from 1 to W, respectively with shell-command sequence flows S iThe middle frequency of occurrences is more than or equal to frequency threshold η iThe shell-command sequence extract as sample, promptly be considered as the shell-command sequence of the normal behaviour pattern of this validated user, and these arrangement sets constituted sequence library L (i) together.
The real-time detecting operation that detection module carries out real-time analysis to Audit data and generates testing result in the described step (7) comprises the following steps:
(71) obtain with pretreatment module from data and obtain Audit data in real time; Promptly when detecting, data obtain that will to obtain the shell-command that this monitored validated user carries out in the monitored time from the shell history file in real time capable with pretreatment module, and after these command-line datas are carried out preliminary treatment, be transformed to a shell-command stream: R ‾ = { s ‾ 1 , s ‾ 2 , . . . . . . , s ‾ r ‾ } , Wherein J the shell-command symbol that expression is arranged in chronological order,
Figure C20051005693400163
Length for this command stream; Data are obtained with pretreatment module and in real time will according to time sequencing R ‾ = { s ‾ 1 , s ‾ 2 , . . . . . . , s ‾ r ‾ } In each shell-command symbol export detection module successively to;
(72) detection module utilizes the sequences match method to excavate shell-command stream In " behavior pattern sequence ", and according in the length computation of each " behavior pattern sequence " it and the sequence library set=L (1), L (2) ..., L (W) } similarity, " the behavior pattern sequence " of being arranged in chronological order, stream P = ( Seq 1 * , Seq 2 * , . . . . . . , Seq M * ) , and corresponding similarity stream Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) , Seq wherein n *Expression from In the n that excavates " behavior pattern sequence ", Sim (Seq n *, L) expression Seq n *With the similarity of sequence library set, M be from
Figure C20051005693400169
In the number of " the behavior pattern sequence " excavated, and int ( r ‾ / l ( W ) ) ≤ M ≤ r ‾ - l ( W ) + 1 ;
(73) similarity is flowed Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) Carry out windowing and get average, with the similarity average that obtains, promptly similarity decision value and decision threshold compare, and then the behavior of this monitored validated user is entered a judgement;
When detecting in real time, described three steps (71), (72), (73) are to carry out synchronously.
Detection module utilizes the sequences match method to excavate described user's shell-command stream Audit data in the described step 72 The calculation of similarity degree of middle behavior pattern sequence and each behavior pattern sequence and sequence library set further comprises the following steps:
(721) three variable: j:=1, i:=W, n:=1 are set;
(722) if j ≤ r ‾ - l ( W ) + 1 , will
Figure C200510056934001614
Compare with sequence library L (i), execution in step 723 again; If j > r ‾ - l ( W ) + 1 , shut-down operation promptly finishes the excavation and the similarity of behavior mode sequences and calculates;
(723) if S ‾ eq j i ∈ L ( i ) , promptly
Figure C20051005693400173
Identical with certain sequence among the sequence library L (i), n behavior pattern sequence then Se q n * : = S ‾ e q j i , Seq n *With the similarity of sequence library set L be: Sim ( Seq n * , L ) : = 2 l ( i ) / 2 l ( W ) , j:=j+l (i), i:=W, n:=n+l, and return execution in step (722); If S ‾ e q j i ∉ L ( i ) , promptly
Figure C20051005693400177
All inequality with any sequence among the sequence library L (i), then i:=i-1, execution in step (724) then;
(724) if execution in step 722 is returned in i ≠ 0; If i=0, then Se q n * : = ( s j ) , Sim ( Seq n * , L ) : = 0 , j:=j+1, i:=W, n:=n+1, and return execution in step (722).
Detection module has two kinds of decision methods to select for users when similarity stream being carried out windowing process and this user's behavior adjudicated in the described step (73);
Wherein first kind of decision method is with fixing window length similarity stream to be carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this user's behavior is adjudicated again; Comprise following operating procedure:
(7301) read the parameter that is provided with in the control module: window length w and decision threshold λ, when from Audit data
Figure C200510056934001710
In excavate n behavior pattern sequence Seq n *, and calculate Sim (Seq n *, L) after, n 〉=w wherein; With Sim (Seq n *, L), calculate with Sim (Seq for terminal point carries out windowing to similarity stream Z n *, L) be w similarity of terminal point, i.e. Sim (Seq N-w+1 *, L), Sim (Seq N-w+2 *, L) ..., Sim (Seq n *, average L) obtains Seq n *Corresponding similarity decision value D (n): D ( n ) = 1 w Σ m = n - w + 1 n Sim ( Seq m * , L ) ;
(7302) utilize decision value D (n) and decision threshold λ that this user " current behavior " adjudicated; If D (n)>λ is judged to normal behaviour with this user's " current behavior "; If D (n)≤λ is judged to abnormal behaviour with this user's " current behavior ";
Wherein second kind of decision method is to adopt variable window length that similarity stream is carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this monitored user's behavior is adjudicated again; Comprise following operating procedure:
(7311) read the parameter that is provided with in the control module: V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V), wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1);
(7312) when from
Figure C20051005693400181
In excavate n " behavior pattern sequence " Seq n *, and calculate Sim (Seq n *, L) after, continue to calculate Seq n *Corresponding similarity decision value D (n), and this user " current behavior " entered a judgement.
Calculating and decision method that described step (7312) is carried out further comprise following operating procedure:
(step 1) is provided with variable k:=1;
(step 2) compares the same w of n (k): if n 〉=w (k) carries out subsequent step; If n<w (k) does not then calculate D (n), this user " current behavior " do not adjudicated yet, finish this operation;
(step 3) is calculated similarity average N, and (n, k), this numerical value is that similarity is flowed Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) In with Sim (Seq n *, L) be the individual similarity Sim (Seq of w (k) of terminal point N-w (k)+1 *, L), Sim (Seq N-w (k)+2 *, L) ..., Sim (Seq n *, L) carry out windowing and get average after obtain: N ( n , k ) = 1 w ( k ) Σ m = n - w ( k ) + 1 n Sim ( Seq m * , L ) , W in the formula (k) is a window length, and w (k)≤n≤M;
(step 4) judge whether to satisfy judgment condition: N (n, k)>u (k), if satisfy this condition, then Seq n *Corresponding similarity decision value is defined as D (n) :=N, and (n k), and is judged to normal behaviour with this user's " current behavior "; If (n k)>u (k), continues to carry out subsequent operation not satisfy judgment condition N;
(step 5) judge whether to satisfy judgment condition: N (n, k)≤d (k), if satisfy this condition, then (n k), and is judged to abnormal behaviour with this user's " current behavior " to D (n) :=N, finishes the judgement to user's " current behavior "; If (n k)≤d (k), continues to carry out subsequent operation not satisfy judgment condition N;
(step 6) k:=k+1, promptly the value of k adds 1, and returns execution (step 2), and subsequent operation is carried out in circulation.
Described detection method is used for abnormality detection is carried out in the behavior of some validated users of computer network system, perhaps abnormality detection is carried out in the behavior of a group or a plurality of validated users in the network system, for the latter, can adopt two kinds of diverse ways:
If the authority and the behavioral characteristic of one group or a plurality of validated users differ bigger, then utilize the normal behaviour training data of each validated user to set up W sequence library respectively, utilize W sequence library separately that abnormality detection is carried out in each user's behavior more respectively;
If one group or a plurality of validated user have same rights and privileges, and behavioral characteristic is more approaching, then these users' training data is combined, the shell-command stream that is about to these users links together and constitutes total training data, utilize this training data to set up W sequence library, utilize this W sequence library that abnormality detection is carried out in each user's behavior again.
The present invention is a kind of detection method of the user behavior abnormality detection system based on machine learning, and its advantage is:
(1) system of the present invention has very strong practicality and operability.This system adopts software to form, can flexible configuration on the webserver of needs monitoring, do not need any hardware of additional configuration, just can detect the user's abnormal behaviour in the webserver, and then make the safety officer discern various attack activity network system.Compare with more existing commercial main frame type intruding detection systems, system of the present invention utilizes the different shell-command sequence of multiple length to represent the various normal behaviour patterns of validated user, and set up the normal behaviour profile that a plurality of sequence libraries are described the user, improved user behavior pattern and behavior profile flexibility and the accuracy in representing.Testing result in the practical application shows that this system has very high detection accuracy rate.
(2) detection method of the present invention is based on machine learning techniques, and in a lot of main frame type intruding detection systems, the common existing positive example of training examples that learning phase (training stage) is adopted has counter-example again.System of the present invention only needs positive example when study, do not need counter-example, thereby greatly reduces the difficulty that training data is collected, and has expanded the range of application and the field of system.In addition, the present invention has adopted unique similarity to calculate (assignment) method; In the present invention, the normal behaviour pattern of validated user is considered to the frequent shell-command sequence of carrying out in its course of normal operation, and therefore, system extracts the sample sequence according to the frequency of occurrences of shell-command sequence in the training data.Testing result in experimental practical application shows that the method for this extraction sample sequence is a kind of very sane method.
(3) detect in the decision method at second kind of the present invention, similarity stream is carried out " variable window length " has been introduced in the windowing filter when making an uproar technological means, and unite and adopt a plurality of decision thresholds that monitored user's behavior is adjudicated, strengthened and detected the stability of performance and the real-time of detection.
(4) detection method of the present invention has adopted the matching way of " complete sequence relatively " when carrying out " behavior pattern sequence " excavation.Therefore, when sequence storage and coupling, can utilize different numbering (integer) to substitute each mutually different shell-command sequence.Compare with some existing main frame type intruding detection systems and detection method, sequences match of the present invention and storage means can reduce the operand in memory data output and the detection greatly, thereby reduce consumption and influence to the resource of the server that system settled.
Description of drawings
Fig. 1 is the universal model figure of machine learning system.
Fig. 2 is the structural representation that the present invention is based on the user behavior abnormality detection system of machine learning.
Fig. 3 is the workflow block diagram of user behavior abnormality detection system of the present invention.
Fig. 4 is the detection method schematic diagram that the present invention is based on the user behavior abnormality detection system of machine learning.
Fig. 5 is the step block diagram that the study module among the present invention is learnt.
Fig. 6 is the step block diagram that the detection module among the present invention detects.
Fig. 7 is the decision value curve chart of the testing result output module output among the present invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with accompanying drawing.
Referring to Fig. 2, the present invention is a kind of detection method of the user behavior abnormality detection system based on machine learning, this system is a software product, be configured on the server that needs monitoring, adopt shell-command on the Unix platform as Audit data, detect in the server in the behavior of user interface layer analysis user and whether invade.This system is obtained with pretreatment module, study module, sequence memory module, detection module, testing result output module by control module, data and forms.
Control module in the system is responsible for the operating state and the various detected parameters of the system that is provided with, and data is obtained with the operation of pretreatment module, study module, detection module and whole system control.Data are obtained with pretreatment module and be responsible for obtaining original training data or Audit data from servers, it is the shell-command line data that the user carries out, and after these original training datas or Audit data being processed into the form of shell-command stream, send into study module or detection module respectively, be used for study or detection.Study module utilizes machine learning techniques, obtains the knowledge of the normal behaviour of the crucial validated user of in the network system certain (certain group) from training data, and sets up the shell-command sequence library of the normal behaviour profile that is used to represent this validated user on this basis.The sequence memory module is used to store the shell-command sequence library that study module is set up; When detecting, this shell-command sequence library can be retrieved comparison for detection module.Detection module is responsible for this validated user performed shell-command in the monitored time is analyzed and handled, and finishes work such as the excavation of " behavior pattern sequence ", windowing filter that similarity is calculated (assignment), similarity are made an uproar, decision value calculating, user behavior judgement.The testing result output module is responsible for showing the decision value curve of detection module generation, and under the driving of detection module abnormal behaviour is reported to the police.
Referring to Fig. 3, the workflow of introducing system of the present invention is as follows:
(1) system start-up;
(2) input of system wait instruction; At this moment, the operating state and the detected parameters of system are set, after setting completed, can import the instruction of " starting working " by control module; Automatically check the situation that is provided with of system again by system control module, enter two kinds of different operating states respectively:, carry out subsequent operation if system is set to learning state; If system is set to detected state, then redirect execution in step (6); Need to prove that one group of default operating state and detected parameters are arranged after the system start-up, set operating state and detected parameters during promptly last operation; If do not need to change above default setting, the instruction that then can directly import " starting working " makes system carry out corresponding step;
(3) under control module drove, data were obtained with pretreatment module and are written into original training data from predefined data-interface, and this original training data is carried out preliminary treatment, made it become the form of shell-command stream, exported it to study module again;
(4) study module utilizes pretreated shell-command stream training data to learn, and sets up the shell-command sequence library, and after depositing this sequence library in the sequence memory module, sends the message of " study finishes " to control module;
(5) after control module receives study module " study finishes " message, make system return step (2), wait for and import the new instruction that is provided with;
(6) under control module drives, directly the operating state with system switches to detected state, data are obtained with pretreatment module and are written into the capable original Audit data of shell-command in real time from predefined data-interface, simultaneously this original Audit data is carried out real-time preliminary treatment, and export pretreated shell-command stream Audit data to detection module in real time;
(7) detection module obtains this Audit data that is provided with pretreatment module to data and carries out real-time analysis, generates testing result;
(8) the testing result output module shows testing result: the decision value curve, and abnormal behaviour carried out Realtime Alerts.
By the workflow of the invention described above system as seen, the present invention mainly comprises three steps (referring to Fig. 3) as a kind of user behavior method for detecting abnormality based on machine learning: obtain data and it is carried out preliminary treatment, learns or train, detects user behavior and exports testing result.Below this three parts work is specifically introduced respectively.
(1) obtain data and it is carried out preliminary treatment: system of the present invention all needs to obtain original training data or Audit data in study with when detecting, and it is carried out preliminary treatment, and this work is obtained with pretreatment module by data and finished.
The present invention adopts the shell-command that the user carries out on the Unix platform capable of original Audit data.Its reason mainly contains three: (1) compares the capable behavior that can more directly reflect the user of shell-command with other Audit data (as CPU use amount, memory usage etc.); (2) on the Unix platform, shell is topmost interface between terminal use and the operating system, and the User Activity of significant proportion all utilizes shell to finish; (3) the capable ratio of shell-command is easier to collect, and also is convenient to analyze.
Shell on the Unix platform has polytype, as tcsh, ksh, bash.System of the present invention is applied to tcsh; Tcsh is the command interpreter with similar C grammer, and its history mechanism is can be with the shell-command of user input capable to be put into historical inventory and preserve.Because order input mode and the history mechanism of dissimilar shell have a lot of general character, thereby user behavior method for detecting abnormality of the present invention also is applicable to (comprising data preprocessing method, learning method and detection method) shell of other type outside the tcsh.
The present invention's capable original input data of needed shell-command when study and detection can obtain from the history file of tcsh.But shell-command is capable can not to be directly used in study or detection, but need carry out preliminary treatment.Pretreated purpose mainly contains two: (1) makes data be convenient to storage in form, analyze and handle; (2) reduce the number of mutually different order symbol in the Audit data.In study with when detecting, data obtain that with pretreatment module original input data to be carried out pretreated method be identical.Concrete grammar is:
1, extracts title (names), mark (flags) and the metacharacter (metacharacters) that shell-command is ordered in capable.
2, information such as filename, server name, catalogue, network address are replaced with the identifier<n of consolidation form 〉, wherein n represents the number of filename, server name, catalogue or network address.
3, on the time point that each shell session begins and finishes, insert the identifier SOF and the EOF of expression starting and ending respectively.
4, the shell-command symbol (shell command tokens) that will comprise the identifier of information such as the symbol of command name, mark and metacharacter and filename, server name is arranged according to the appearance order in the shell session; And connect the order symbol of different shell sessions, and in above-mentioned data, do not add timestamp according to time sequencing.
Original input data becomes shell-command stream in form through after this preliminary treatment: a string shell-command symbol of arranging in chronological order, and shell-command stream can comprise the content of a plurality of shell sessions.For example, the order line of certain user adjacent shell session on two times of carrying out on the tcsh:
#Start session 1
cd~/games/
xquake &
fg
vi scores.txt
mailx john_doe@somewhere.com
exit
#End session 1 He
#Start session 2
cd~/private/docs
ls-laF | more
cat foo.txt bar.txt zorch.txt>somewhere
exit
#End session 2
After preliminary treatment, become following shell-command stream:
(SOF,cd,<1>,xquake,&,fg,vi,<1>,mailx,<1>,exit,EOF,SOF,cd,<1>,ls,-laF,|,more,cat,<3>,>,<1>,exit,EOF)
Wherein<1 〉,<3 are identifiers of contents such as filename, catalogue, Email address, SOF and EOF are respectively the identifiers that session begins and finishes.As seen, shell-command stream is a string shell-command symbol of arranging in chronological order; A shell-command stream can comprise the content of a plurality of shell sessions.
(2) learn or train: system carry out detect before, at first need to learn (training), promptly from training data, obtain the knowledge of the normal behaviour of certain crucial validated user in the network system, and on this basis, set up the shell-command sequence library of the normal behaviour profile be used to represent this user, when detecting, retrieve comparison for detection module.The study and work of system is finished by study module.
Referring to Fig. 5, the learning procedure of introducing study module is as follows:
(1) obtain the normal behaviour training data that obtains this validated user with pretreatment module from data: original normal behaviour training data is that the shell-command carried out during normal running in history of this validated user is capable, and these command-line datas are through becoming a shell-command stream (this command stream has been represented the historical normal behaviour of this validated user) after preliminary treatment: R=(s 1, s 2..., s r), it is the shell-command stream that a length is r, wherein s jJ the shell-command symbol that expression is arranged in chronological order.
(2) from control module, read learning parameter: be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W).
This system represents the various actions pattern of this validated user in when study with the different shell-command sequence of W kind length, and the various normal behaviour patterns in the training data (being the higher shell-command sequence of the frequency of occurrences) are gathered together constitutes this user's normal behaviour profile.Under the situation that W determines, l (1), l (2) ..., l (W) can have different selections.For example during W=3, l (1), l (2), l (3) can be respectively 1,2,3 (promptly the length of sequence is respectively 1,2,3 in 3 sequence libraries), also can be respectively 3,6,9 or other combination.W and l (i) have a direct impact detecting performance and detection efficiency, and W and l (i) are big more, and the operand in the memory data output of system and the detection also can be big more.
(3) by shell-command stream R generate W sequence length be respectively l (1), l (2) ..., the shell-command sequence flows of l (W): S 1, S 2..., S W, S wherein iBe that sequence length is the shell-command sequence flows of l (i): S i = ( Seq 1 i , Seq 2 i , . . . . . . , Seq r - l ( i ) + 1 i ) , in the formula Seq j i = ( s j , s j + 1 , . . . . . . , S j + l ( i ) - 1 ) , it is S iIn j the shell-command sequence of arranging in chronological order, i is the natural number in the interval [1, W].
Illustrate the generative process of shell-command sequence flows below.
W=3 for example, l (1), l (2), l (3) are respectively 1,2,3, R=(s 1, s 2..., s 12)=( *SOF *, cd,<1 〉, xquake , ﹠amp; , fg, vi,<1 〉, mailx,<1 〉, exit, *EOF *), then can formation sequence length be respectively 3 shell-command sequence flows S of 1,2,3 by R 1, S 2, S 3, wherein
S 1 = ( Seq 1 1 , Seq 2 1 , . . . . . . , Seq 12 1 ) = ( ( SOF * * * * ) , ( cd ) , ( < 1 > ) , ( xquake ) , ( & ) , ( fg ) , ( vi ) ,
(<1>),(mailx),(<1>),(exit),( **EOF **))
S 2 = ( Seq 1 2 , Seq 2 2 , . . . . . . , Seq 11 2 ) = ( ( SOF * * * * , cd ) , ( cd , < 1 > ) , ( < 1 > , xquake ) ,
(xquake,&),(&,fg),(fg,vi),(vi,<1>),(<1>,mailx),(mailx,<1>),(<1>,exit),(exit, **EOF **))
S 3 = ( Seq 1 3 , Seq 2 3 , . . . . . . , Seq 10 3 ) = ( ( SOF * * * * , cd , < 1 > ) , ( cd , < 1 > , xquake ) , ( < 1 >
xquake,&),(xquake,&,fg),(&,fg,vi),(fg,vi,<1>),(vi,<1>,mailx),(<1>,mailx,<1>),(mailx,<1>,exit),(<1>,exit, **EOF **))
(4) each the shell-command sequence among the calculating shell-command sequence flows Si is at S iIn the frequency of occurrences, wherein i is the natural number on interval [1, W]; Be sequence of calculation stream S iIn each shell sequence at S iIn occurrence number divided by each sequence occurrence number sum in this sequence flows.
Illustrate computational process below.For example, the sequence flows S shown in front 1=(( *SOF *), (cd), (<1 〉), (xquake), (﹠amp; ), (fg), (vi), (<1 〉), (mailx), (<1 〉), (exit), ( *EOF *)) in, the frequency of occurrences of sequence (<1 〉) is 1/4, the frequency of occurrences of sequence (cd) is 1/12.And at sequence flows S 2=(( *SOF *, cd), (cd,<1 〉), (<1 〉, xquake), (xquake , ﹠amp; ), (﹠amp; , fg), (fg, vi), (vi,<1 〉), (<1 〉, mailx), (mailx,<1 〉), (<1 〉, exit), (exit, *EOF *)) in, sequence (<1 〉, frequency of occurrences xquake) is 1/11 (frequency of occurrences of other sequence also is 1/11).
(5) read frequency threshold parameter in the control module: the frequency threshold η that is used to set up W shell-command sequence library 1, η 2..., η Wη wherein iBe employed frequency threshold when to set up sequence length be the shell-command sequence library of l (i), i is the natural number in interval [1, W].
(6) according to the frequency of occurrences of sequence, from W shell-command sequence flows S 1, S 2..., S WIn extract several shell-command sequences respectively as sample, set up W sequence library; Its concrete grammar step is:
If W is used to represent the set L={L (1) of the sequence library of this user's normal behaviour profile, L (2) ..., L (W) }, wherein L (i) expression is the sequence library that the sequence of l (i) is formed by length;
According to the order of natural number i from 1 to W, respectively with shell-command sequence flows S iThe middle frequency of occurrences is more than or equal to frequency threshold η iThe shell-command sequence extract as sample, promptly be considered as the shell-command sequence of the normal behaviour pattern of this validated user, and these arrangement sets constituted sequence library L (i) together.
Of particular note: study module when if new training data has been arranged, can recomputate the frequency of occurrences of each sequence automatically, and then sequence library is adjusted in the original training data learning process of carrying out to input according to new data; Promptly this system can adapt to the variation of validated user normal behaviour automatically.
(3) detect user behavior and export testing result: the normal behaviour profile that utilizes the described validated user that study module sets up, current behavior to this user is monitored in real time: if this user's current behavior departs from its historical normal behaviour profile largely, promptly think and take place to carry out relevant treatment unusually; This is to carry out the result that account number that unauthorized operation or outside invasion person falsely use this user is carried out illegal operation by this user unusually.Testing is mainly finished by detection module.
Referring to Fig. 6, the real-time detection step of introducing detection module is as follows:
(1) obtains with pretreatment module from data and obtain Audit data in real time; Promptly when detecting, data obtain that will to obtain the shell-command that this monitored validated user carries out in the monitored time from the shell history file in real time capable with pretreatment module, and after these command-line datas are carried out preliminary treatment, be transformed to a shell-command stream: R &OverBar; = { S &OverBar; 1 , S &OverBar; 2 , . . . . . . , S &OverBar; r &OverBar; } , Wherein J the shell-command symbol that expression is arranged in chronological order,
Figure C20051005693400263
Length for this command stream; Data are obtained with pretreatment module and in real time will according to time sequencing R &OverBar; = { S &OverBar; 1 , S &OverBar; 2 , . . . . . . , S &OverBar; r &OverBar; } In each shell-command symbol export detection module successively to.
(2) detection module utilizes the sequences match method to excavate shell-command stream In " behavior pattern sequence ", and according to the length computation of each " behavior pattern sequence " it and sequence library set L={L (1), L (2) ..., L (W) } similarity, " the behavior pattern sequence " of being arranged in chronological order stream P = ( Seq 1 * , Seq 2 * , . . . . . . , Seq M * ) , And corresponding similarity stream Seq wherein n *Expression from In the n that excavates " behavior pattern sequence ", Sim (Seq n *, L) expression Seq n *With the similarity of sequence library set, M be from
Figure C20051005693400269
In the number of " the behavior pattern sequence " excavated, and int ( r &OverBar; / l ( W ) ) &le; M &le; r &OverBar; - l ( W ) + 1 .
Wherein " behavior pattern sequence " excavation and similarity Calculation Method concrete steps are described below:
Step 1, three variable: j:=1, i:=W, n:=1 are set;
If step 2 j &le; r &OverBar; - l ( W ) + 1 , will Compare with sequence library L (i), execution in step 3 again; If j > r &OverBar; - l ( W ) + 1 , Shut-down operation (" behavior pattern sequence " excavates and similarity computational process finishes);
If step 3 S &OverBar; e q j i &Element; L ( i ) (promptly
Figure C20051005693400275
Identical with certain sequence among the L (i)), then n " behavior pattern sequence " Seq n * : = S &OverBar; e q j i , Seq n *Similarity with sequence library set L Sim ( Seq n * , L ) : = 2 l ( i ) / 2 l ( W ) , J:=j+l (i), i:=W, n:=n+1, and return execution in step 2; If S &OverBar; eq j i &NotElement; L ( i ) (promptly
Figure C20051005693400279
All inequality with any sequence among the L (i)), i:=i-1 then, execution in step 4 then;
If execution in step 2 is returned in step 4 i ≠ 0; If i=0, then Seq n * : = ( s j ) , Sim ( Seq n * , L ) : = 0 J:=j+1, i:=W, n:=n+1, and return execution in step 2.
Above " behavior pattern sequence " excavated and similarity calculating method can be understood as: with first shell-command (symbol) is starting point, form W length and be respectively l (W), l (W-1), ..., the sequence of l (1), and successively these sequences and corresponding sequence library are compared (coupling) according to length order from big to small, if one of them sequence is identical with certain sequence in the corresponding sequence storehouse, think that then this sequence is a normal behaviour pattern of this validated user, with this sequence definition is " behavior pattern sequence ", and calculate the similarity of this sequence and sequence library set according to sequence length, sequence length is long more, and the value of similarity is also big more.If the sequence in any one sequence and the corresponding sequence storehouse is all inequality, then current shell-command (symbol) is defined as length and is 1 " behavior pattern sequence ", and the similarity value of this sequence correspondence is composed is 0, then, be that starting point is formed W the sequence that length is different with the next shell-command (symbol) after this sequence again, proceed sequence relatively and similarity calculating according to above method, up to the Till the individual shell-command (symbol).Whether the current command sequence of the monitored user of this method major concern can mate fully with historical certain the normal behaviour pattern (normal sequence) of this user.In above-mentioned steps 3, formula
Figure C200510056934002712
Expression is " behavior pattern sequence " Seq of l (i) with length n *Similarity Sim (Seq with sequence library set L n *, L) composing is 2 L (i)/ 2 L (W)Here, " behavior pattern sequence " corresponding similarity is the increasing function about its length, and the maximum of this function is 1.For example,, C={l (1), l (2), l (3) at W=3 }=1,2,3) and situation under, length is that three kinds " behavior pattern sequence " corresponding similarity of 1,2,3 is respectively 2/8,4/8,8/8.
Carry out " behavior pattern sequence " according to above method and excavate and similarity calculating " the behavior pattern sequence " that can be arranged in chronological order stream P = ( Seq 1 * , Seq 2 * , . . . . . . , Seq M * ) , And corresponding similarity stream Wherein M be from
Figure C20051005693400282
In the number of " the behavior pattern sequence " excavated, int ( r &OverBar; / l ( W ) ) &le; M &le; r &OverBar; - l ( W ) + 1 . (int represents rounding operation.)
It may be noted that from
Figure C20051005693400284
In the length of " the behavior pattern sequence " excavated be unfixed.The length of these " behavior pattern sequences " can be for 1, l (1), l (2) ..., l (W-1) or, l (W).When from
Figure C20051005693400285
In the length of " the behavior pattern sequence " excavated when all being l (W) (in this case validated user in the monitored time behavior and its history on the degree of agreement of normal behaviour profile best), M gets minimum value When from In the length of " the behavior pattern sequence " excavated all be 1 o'clock (behavior of this user in the monitored time is the poorest with the degree of agreement of normal behaviour profile on its history in this case), M gets maximum
Figure C20051005693400288
(3) similarity is flowed Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) Carry out windowing and get average, with the similarity average that obtains, promptly similarity decision value and decision threshold compare, and then the behavior of this monitored validated user is entered a judgement.At this moment, two kinds of selectable decision methods are arranged.
Wherein first kind of decision method is with fixing window length similarity stream to be carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this user's behavior is adjudicated again; Comprise following operating procedure:
(101) at first read the parameter that is provided with in the control module: window length w and decision threshold λ, when from Audit data
Figure C200510056934002810
In excavate n behavior pattern sequence Seq n *, and calculate Sim (Seq n *, L) after, n 〉=w wherein; With Sim (Seq n *, L), calculate with Sim (Seq for terminal point carries out windowing to similarity stream z n *, L) be w similarity of terminal point, i.e. Sim (Seq N-w+1 *, L), Sim (Seq N-w+2 *, L) ..., Sim (Seq n *, average L) obtains Seq n *Corresponding similarity decision value D (n): D ( n ) = 1 w &Sigma; m = n - w + 1 n Sim ( Seq m * , L ) ;
(102) utilize decision value D (n) and decision threshold λ that this user " current behavior " adjudicated; If D (n)>λ is judged to normal behaviour with this user's " current behavior "; If D (n)≤λ is judged to abnormal behaviour with this user's " current behavior "; Wherein this user " current behavior " is with respect to Seq n *, that be meant this user's execution is w " behavior pattern sequence ", i.e. Seq of terminal point with Seqn* N-w+1 *, Seq N-w+2 *..., Seq n *Among the D (n), the initial value of n is w, i.e. n 〉=w, and the growth step-length of n is 1, excavating w " behavior pattern sequence " afterwards, whenever excavates one " behavior pattern sequence " again, just can make once this user's behavior and adjudicating; When n<w, do not calculate D (n), do not adjudicate yet.
Wherein second kind of decision method is to adopt variable window length that similarity stream is carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this monitored user's behavior is adjudicated again; Comprise following operating procedure:
(201) read the parameter that is provided with in the control module, comprise: V window length w (1), w (2), ..., w (V), V judgement upper limit u (1), u (2), ..., u (V) and V judgement lower limit d (1), d (2), ..., d (V), wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1);
(202) when from
Figure C20051005693400291
In excavate n " behavior pattern sequence " Seq n *, and calculate Sim (Seq n *, L) after, continue to calculate Seq n *Corresponding similarity decision value D (n), and this user " current behavior " entered a judgement; Specifically comprise following operating procedure:
Step 1, variable k:=1 is set;
Step 2, the same w of n (k) is compared: if n 〉=w (k), execution in step 3; If n<w (k) does not then calculate D (n), this user " current behavior " do not adjudicated yet, finish this operation;
Step 3: calculate the similarity average N ( n , k ) = 1 w ( k ) &Sigma; m = n - w ( k ) + 1 n Sim ( Se q m * , L ) , this numerical value is that similarity is flowed Z = ( Sim ( Se q 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) In with Sim (Seq n *, L) be the individual similarity Sim (Seq of w (k) of terminal point N-w (k)+1 *, L), Sim (Seq N-w (k)+2 *, L) ..., Sim (Seq n *, L) carry out windowing, and get the similarity average that obtains after the average.Here, w (k) is a window length, and w (k)≤n≤M.
Step 4, judge whether to satisfy judgment condition: N (n, k)>u (k), if satisfy this condition, then the similarity decision value of Seqn* correspondence is defined as D (n) :=N (n, k), and this user's " current behavior " is judged to normal behaviour, and (here, this user " current behavior " is with respect to Seq n *, it be meant that this user carries out with Seq n *Be the w (k) of terminal point individual " behavior pattern sequence ", i.e. Seq N-w (k)+1 *, Seq N-w (k)+2 *..., Seq n *).So far, finish decision operation, no longer carry out subsequent step user's " current behavior "; If (n k)>u (k), continues execution in step 5 not satisfy judgment condition N.
Step 5, judge whether to satisfy judgment condition: N (n, k)≤d (k), if satisfy this condition, then (n k), and is judged to abnormal behaviour with this user's " current behavior " to D (n) :=N, end is no longer carried out subsequent step to the judgement of user's " current behavior "; If (n k)≤d (k), continues execution in step 6 not satisfy judgment condition N.
Step 6, k:=k+1, promptly the value of k adds 1, and returns execution in step 2, and subsequent operation is carried out in circulation.
In above-mentioned second kind of decision method, calculating similarity decision value and the method that monitored user's current behavior is adjudicated can be understood as: n similarity Sim (Seq in calculating similarity stream n *, L) afterwards, at first window length is made as minimum value w (1), under the situation of n, with Sim (Seq more than or equal to window length n *L) for terminal point similarity stream is carried out windowing and get average (number of similarity equals window length in the window), then that this similarity average is corresponding with this window length judgement upper and lower bound compares, if satisfy judgment condition (promptly this similarity average is greater than the judgement upper limit of window length correspondence or smaller or equal to the judgement lower limit of window length correspondence), then this similarity average is defined as similarity decision value D (n), simultaneously monitored user's current behavior is entered a judgement, decision method is: if this similarity average (similarity decision value) is greater than the corresponding judgement upper limit, monitored user " current behavior " is judged to normal behaviour, if this similarity average then is judged to abnormal behaviour with it smaller or equal to corresponding judgement lower limit; If do not satisfy judgment condition (promptly this similarity average is greater than the judgement lower limit of window length correspondence and smaller or equal to the judgement upper limit of window length correspondence), then according to w (2), w (3) ... the precedence of w (V) increases window length, under the situation of n greater than window length, repeat above similarity windowing, get average and comparison procedure, till satisfying judgment condition, thereby obtain D (n); Simultaneously according to judgment condition monitored user " current behavior " entered a judgement (when running into n less than the situation of window length, D (n) is no longer calculated in then shut-down operation, also monitored user " current behavior " is not adjudicated).
According to above-mentioned second kind of decision method, when n<w (1), do not calculate similarity decision value D (n), this user " current behavior " do not adjudicated yet.When w (1)≤n<w (V), not necessarily can access D (n) and this user " current behavior " entered a judgement.When w (V)≤n≤M, always can obtain D (n), and can enter a judgement to this user " current behavior " and (annotate: u (V)=d (V)); At this moment, the growth step-length of n is 1 among the D (n), that is to say, whenever excavates one " behavior pattern sequence " and just can make once this user's behavior and adjudicating.And, in actual applications, from
Figure C20051005693400301
In the number M of " the behavior pattern sequence " excavated usually much larger than the longest window length w (V); So with respect to w (V)≤n≤M, n<w (1) and w (1)≤n<w (V) belongs to a few cases.
It may be noted that, detection module is when detecting, and three steps that monitored user is performed: what shell-command was capable obtains and preliminary treatment, the excavation and the calculation of similarity degree of " behavior pattern sequence ", to the windowing process of similarity stream, and the judgement of user behavior all carried out synchronously.In testing process, after monitored user executes several " behavior pattern sequences " (its number schoolmate length is relevant), whenever execute one " behavior pattern sequence " again, detection system of the present invention just can be excavated this " behavior pattern sequence ", and " behavior pattern sequence " corresponding similarity is somebody's turn to do in calculating, be that terminal point carries out windowing process (obtaining the corresponding similarity decision value of this " behavior pattern sequence ") to similarity stream with this similarity then, and then monitored user " current behavior " made once judgement.
In above detection step (3), the difference of two kinds of decision methods is that window length is different with decision method.In first kind of decision method, window length w is an important parameter, it has determined to occur to detection system is made judgement for the first time to its behavior time (being detection time) from monitored user behavior, the minimum length in time that equals w shell-command symbol the shortest detection time of this scheme (time that sequence compares and decision value calculates in not considering to detect).So w is more little for window length, the real-time of detection is just strong more.But in the practical application of system, along with reducing of window length w, accuracy in detection presents the trend of reduction.Second kind of decision method then taken into account detection time and accuracy in detection; This scheme adopts variable window length, compares with first kind of decision method, can improve the real-time that detects under the prerequisite that guarantees equal accuracy in detection, but the complexity of this scheme is higher relatively.
Need to prove: above-mentioned detection method is only carried out abnormality detection to the behavior of some (rather than the one group) validated user in the computer network system.In fact, detection system of the present invention and method can also be carried out abnormality detection to the behavior of one group of (a plurality of) validated user in the network system.Can adopt two kinds of ways in this case: (1) is if the authority of these validated users and behavioral characteristic differ bigger, can utilize the normal behaviour training data of each validated user to set up W sequence library respectively, utilize W sequence library separately that abnormality detection is carried out in each user's behavior more respectively; (2) if these validated users have same rights and privileges, and behavioral characteristic is more approaching, then these users' training data can be combined (the shell-command stream that is about to these users links together) and constitute total training data, utilize this training data to set up W sequence library, utilize this W sequence library that abnormality detection is carried out in each user's behavior again.
Introduce a test application example below and specify embodiment of the present invention.In this test application example, user behavior abnormality detection system of the present invention is configured on the server in certain corporate lan, be used for monitoring certain key procedure person's of this local area network (LAN) behavior, preventing that this programmer from carrying out unauthorized operation, and prevent that the account number that outside invasion person falsely uses this programmer from carrying out malicious operation.This test application example comprises that the operating state of system is two situations of learning state (physical training condition) and detected state.
Wherein the embodiment operating procedure of learning state is as follows:
(1) start-up system.
(2) safety officer of this local area network (LAN) is configured the operating state and the parameter of system: the operating state of system is made as learning state, the number W of shell-command sequence library is made as 5, sequence length l (1), l (2), l (3), l (4), l (5) are made as 1,2,3,4,5 respectively, frequency threshold η 1, η 2..., η WAll be made as 0.0002.After setting completed, the instruction of safety officer's input " starting working ", system accepts promptly to begin automatic operation after the instruction.
(3) control module is checked the situation that is provided with to system automatically, finds that the operating state of system is set as learning state, so system is switched to learning state.
(4) the control module driving data is obtained with pretreatment module original training data is written into from appointed positions.These data be on inherent this server of this programmer 8 months during normal running performed shell-command capable.
(5) data are obtained with pretreatment module original training data (shell-command is capable) are processed into the form that shell-command flows, and export it to study module.The shell-command that original training data is carried out obtaining after the preliminary treatment flows as follows: R=( *SOF *, cd,<1 〉, cd,<1〉..., vi,<1 〉, logout, *EOF *), comprise 9935 shell-command symbols altogether in this shell-command stream.
(6) the above shell-command stream of study module utilization is learnt, and sets up the shell-command sequence library, and deposits sequence library in the sequence memory module.The sequence length that study module is set up is that 1,2,3,4,5 sequence library L (1), L (2), L (3), L (4), L (5) are made up of 160,508,860,1015,1526 shell-command sequences respectively.
(7) study module is to the message of control module transmission " study finishes ", and control module makes system turn back to the state of waiting for instruction, i.e. state after the system start-up after receiving message.
(8) system closing.
In above embodiment, the core that system is learnt is to set up the shell-command sequence library according to training data, uses (retrieval relatively) for detection module when detecting.Set frequency threshold η 1, η 2 in the step (2) ..., η W is important parameters very, they have determined the number of shell sequence among 5 sequence library L (1), L (2), L (3), L (4), the L (5).Frequency threshold is more little, and the sequence number in the sequence library is just many more, and system wants the data quantity stored will be big more, so frequency threshold can not be too little; But, if frequency threshold is excessive, can miss the shell-command sequence (behavior pattern) that some can reflect validated user operation rule when setting up sequence library, the feasible sequence library of being set up is the normal behaviour profile of representative of consumer well, thereby influences the detection accuracy rate of system; So, frequency threshold rationally be set be the key issue in the study.Below table 1 when having provided frequency threshold and being made as different numerical value, the sequence number in 5 sequence libraries.
Frequency threshold 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006
The sequence number of sequence library L (1) 220 160 141 123 112 102
The sequence number of sequence library L (2) 962 508 365 293 247 208
The sequence number of sequence library L (3) 2155 860 559 398 320 250
The sequence number of sequence library L (4) 3963 1015 893 604 542 461
The sequence number of sequence library L (5) 6892 1526 1232 1067 915 796
Wherein the embodiment operating procedure of detected state is as follows:
(1) start-up system.
(2) safety officer of this local area network (LAN) is configured the operating state and the parameter of system: the operating state of system is made as detected state, decision method is made as second kind of decision method, the window length number V of this decision method is made as 3,3 window length are made as w (1)=30 respectively, w (2)=60, w (3)=90,3 judgement upper limits are made as u (1)=0.8 respectively, u (2)=0.7, u (3)=0.5,3 judgement lower limits are made as d (1)=0.3 respectively, d (2)=0.4, d (3)=0.5.After setting completed, the instruction of safety officer's input " starting working ", system accepts promptly to begin automatic operation after the instruction.
(3) control module is checked the situation that is provided with to system automatically, finds that the operating state of system is set as detected state, so system is switched to detected state.
(4) the control module driving data is obtained with pretreatment module original Audit data is written into from appointed positions; Simultaneously, data are obtained with pretreatment module original Audit data (shell-command is capable) are carried out preliminary treatment, and export pretreated Audit data (shell-command stream) to detection module.
(5) detection module obtains the Audit data machine that is provided with pretreatment module to data and analyzes the generation testing result.
(6) the testing result output module shows testing result (decision value curve), and abnormal behaviour is reported to the police.Fig. 7 is the decision value curve of output module output.Among the figure, the solid line of top is the decision value curve of system's testing result output module output when monitoring this programmer's normal behaviour.Be written into as original Audit data if the shell-command that the safety officer is carried out is capable, the decision value curve of testing result output module output is the dotted line of below among the figure; This dotted line can be considered the decision value curve that system exports when monitoring abnormal behaviour, because the operation that the safety officer is carried out in server mostly belongs to unauthorized operation (abnormal behaviour) to this programmer, in other words, the uncommitted a lot of operations carried out in safety officer's authority of this programmer.If going beyond one's commission, this programmer carried out the interior operation of safety officer's authority, the decision value of each point will smaller (as shown in phantom in FIG.) in the decision value curve of system output, when decision value during less than corresponding decision threshold, system just can detect these unauthorized operations (abnormal behaviour) and report to the police.

Claims (9)

1, a kind of based on the unusual detection method of the user behavior of machine learning, this method is to utilize based on the user behavior abnormality detection system of machine learning to realize, described system configuration is on the server of needs monitoring, adopt user interface shell-command on the Unix platform as Audit data, detect in the server in the behavior of user interface layer analysis user and whether invade; This system includes: control module, data are obtained and pretreatment module, study module, sequence memory module, detection module and testing result output module; It is characterized in that: described detection method comprises following operating procedure:
(1) system start-up;
(2) during the input of system wait instruction, the operating state and the running parameter of system are set by control module, so that after input " starting working " instruction after this, automatically check the situation that is provided with of system by control module, enter two kinds of different operating states respectively:, carry out subsequent operation if system is set to learning state; If system is set to detected state, then redirect execution in step (6);
(3) under control module drove, data were obtained with pretreatment module and are written into original training data from predefined data-interface, and this original training data is carried out preliminary treatment, made it become the form of shell-command stream, exported it to study module again;
(4) study module utilizes pretreated shell-command stream training data to learn, and sets up the shell-command sequence library, and after depositing this sequence library in the sequence memory module, sends the message of " study finishes " to control module;
(5) after control module receives study module " study finishes " message, make system return step (2), wait for and import the new instruction that is provided with; Perhaps direct operating state with system transfers detected state to, carries out subsequent operation;
(6) under control module drives, data are obtained with pretreatment module and are written into the capable original Audit data of shell-command in real time from predefined data-interface, simultaneously this original Audit data is carried out real-time preliminary treatment, and export pretreated shell-command stream Audit data to detection module in real time;
(7) detection module carries out real-time analysis to this Audit data, generates testing result;
(8) the testing result output module shows testing result: the decision value curve, and abnormal behaviour carried out Realtime Alerts.
2, the unusual detection method of user behavior according to claim 1 is characterized in that: in the described step (2), if the operating state of system is set to learning state, the running parameter that needs to be provided with comprises:
Be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
If the operating state of system is set to detected state, the running parameter that needs to be provided with has two kinds, is respectively:
First kind of decision method, window length w, decision threshold λ; Or
Second kind of decision method, the number V of window length, V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V); Wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1).
3, the unusual detection method of user behavior according to claim 1 is characterized in that: data are obtained with pretreatment module the preliminary treatment that original training data or original Audit data carry out are comprised the following steps: in described step (3) or the step (6)
(31) extract title, mark and the metacharacter that shell-command is ordered in capable;
(32) will include but not limited to that the information of filename, server name, catalogue, network address replaces with the identifier<n of consolidation form 〉, wherein n represents the number of filename, server name, catalogue or network address;
(33) on the time point that each shell session begins and finishes, insert identifier SOF and the EOF that represents starting and ending respectively;
(34) the shell-command symbol that will comprise the message identification symbol of the symbol of title, mark and metacharacter of order and filename, server name, catalogue, network address is arranged according to the appearance order in the shell session; And connect the order symbol of different shell sessions according to time sequencing, and in above-mentioned data, do not add timestamp, through after this preliminary treatment, original input data becomes shell-command stream in form: a string shell-command symbol of arranging in chronological order, and shell-command stream comprises the content of a plurality of shell sessions.
4, the unusual detection method of user behavior according to claim 1 is characterized in that: the learning manipulation that study module utilizes pretreated shell-command stream training data to carry out in the described step (4) comprises the following steps:
(41) obtain the training data that obtains passing through pretreated, as to represent this validated user normal behaviour with pretreatment module from data: R=(s 1, s 2..., s r), i.e. length be r shell-command stream, wherein s jJ the shell-command symbol that expression is arranged in chronological order;
(42) from control module, read learning parameter: be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
(43) by shell-command stream R generate W sequence length be respectively l (1), l (2) ..., the shell-command sequence flows of l (W): S 1, S 2..., S W, S wherein iBe that sequence length is the shell-command sequence flows of l (i): S i=(Seq 1 i, Seq 2 i..., Seq R-l (i)+1 i), Seq in the formula j i=(s j, s J+1..., s J+l (i)-1), it is S iIn j the shell-command sequence of arranging in chronological order, i is the natural number in the interval [1, W];
(44) calculate shell-command sequence flows S iIn each shell-command sequence at S iIn the frequency of occurrences, wherein i is the natural number in interval [1, W]; Be sequence of calculation stream S iIn each shell sequence at S iIn occurrence number divided by each sequence occurrence number sum in this sequence flows;
(45) read frequency threshold parameter in the control module: the frequency threshold η that is used to set up W shell-command sequence library 1, η 2..., η Wη wherein iBe employed frequency threshold when to set up sequence length be the shell-command sequence library of l (i), i is the natural number in interval [1, W];
(46) according to the frequency of occurrences of sequence, from W shell-command sequence flows S 1, S 2..., S WIn extract several shell-command sequences respectively as sample, set up W sequence library; Its concrete grammar step is:
If W is used to represent the set L={L (1) of the sequence library of this user's normal behaviour profile, L (2) ..., L (W) }, wherein L (i) expression is the sequence library that the sequence of l (i) is formed by length;
According to the order of natural number i from 1 to W, respectively with shell-command sequence flows S iThe middle frequency of occurrences is more than or equal to frequency threshold η iThe shell-command sequence extract as sample, promptly be considered as the shell-command sequence of the normal behaviour pattern of this validated user, and these arrangement sets constituted sequence library L (i) together.
5, the unusual detection method of user behavior according to claim 1 is characterized in that: the real-time detecting operation that detection module carries out real-time analysis to Audit data and generates testing result in the described step (7) comprises the following steps:
(71) obtain with pretreatment module from data and obtain Audit data in real time; Promptly when detecting, data obtain that will to obtain the shell-command that this monitored validated user carries out in the monitored time from the shell history file in real time capable with pretreatment module, and after these command-line datas are carried out preliminary treatment, be transformed to a shell-command stream: R &OverBar; = { s &OverBar; 1 , s &OverBar; 2 , . . . . . . , s &OverBar; r &OverBar; } , wherein
Figure C2005100569340005C2
Tactic j shell-command symbol asked in expression on time, Length for this command stream; Data are obtained with pretreatment module and in real time will according to time sequencing R &OverBar; = { s &OverBar; 1 , s &OverBar; 2 , . . . . . . , s &OverBar; r &OverBar; } In each shell-command symbol export detection module successively to;
(72) detection module utilizes the sequences match method to excavate shell-command stream In " behavior pattern sequence ", and according to the length computation of each " behavior pattern sequence " it and sequence library set L={L (1), L (2) ..., the similarity of L (W), " the behavior pattern sequence " of being arranged in chronological order stream P=(Seq 1 *, Seq 2 *..., Seq M *), and corresponding similarity stream Z=(Sim (Seq 1 *, L), Sim (Seq 2 *, L) ..., Sim (Seq M *, L)), Seq wherein n *Expression from In the n that excavates " behavior pattern sequence ", Sim (Seq n *, L) expression Seq n *With the similarity of sequence library set, M be from In the number of " the behavior pattern sequence " excavated, and int ( r &OverBar; / l ( W ) ) &le; M &le; r &OverBar; - l ( W ) + 1 ;
(73) to similarity stream Z=(Sim (Seq 1 *, L), Sim (Seq 2 *, L) ..., Sim (Seq M *, L)) carry out windowing and get average, with the similarity average that obtains, promptly similarity decision value and decision threshold compare, and then the behavior of this monitored validated user is entered a judgement;
When detecting in real time, described three steps (71), (72), (73) are to carry out synchronously.
6, the unusual detection method of user behavior according to claim 5 is characterized in that: detection module utilizes the sequences match method to excavate described user's shell-command stream Audit data in the described step (72)
Figure C2005100569340005C9
The calculation of similarity degree of middle behavior pattern sequence and each behavior pattern sequence and sequence library set further comprises the following steps:
(721) three variable: j:=1, i:=W, n:=1 are set;
(722) if j &le; r &OverBar; - l ( W ) + 1 , will
Figure C2005100569340005C11
Compare with sequence library L (i), again execution in step (723);
If j > r &OverBar; - l ( W ) + 1 , shut-down operation promptly finishes the excavation and the similarity of behavior mode sequences and calculates;
(723) if S &OverBar; e q j i &Element; L ( i ) , promptly
Figure C2005100569340005C14
Identical with certain sequence among the sequence library L (i), n behavior pattern sequence then Se q n * : = S &OverBar; e q j i , Seq n *With the similarity of sequence library set L be: Sim (Seq n *, L) :=2 L (i)/ 2 L (W), j:=j+l (i), i:=W, n:=n+1, and return execution in step (722); If S &OverBar; e q j i &NotElement; L ( i ) , promptly
Figure C2005100569340006C2
All inequality with any sequence among the sequence library L (i), then i:=i-1, execution in step (724) then;
(724) if execution in step (722) is returned in i ≠ 0; If i=0, then Seq n *:=(s j), Sim (Seq n *, L) :=0, j:=j+1, i:=W, n:=n+1, and return execution in step (722).
7, the detection method of user behavior abnormality detection system according to claim 5, it is characterized in that: detection module has two kinds of decision methods to select for users when similarity stream being carried out windowing process and this user's behavior adjudicated in the described step (73);
Wherein first kind of decision method is with fixing window length similarity stream to be carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this user's behavior is adjudicated again; Comprise following operating procedure:
(7301) read the parameter that is provided with in the control module: window length w and decision threshold λ, when from Audit data In excavate n behavior pattern sequence Seq n *, and calculate Sim (Seq n *, L) after, n 〉=w wherein; With Sim (Seq n *, L), calculate with Sim (Seq for terminal point carries out windowing to similarity stream Z n *, L) be w similarity of terminal point, i.e. Sim (Seq N-w+1 *, L), Sim (Seq N-w+2 *, L) ..., Sim (Seq n *, average L) obtains Seq n *Corresponding similarity decision value D (n): D ( n ) = 1 w &Sigma; m = n - w + 1 n Sim ( Seq m * , L ) ;
(7302) utilize decision value D (n) and decision threshold λ that this user " current behavior " adjudicated; If D (n)>λ is judged to normal behaviour with this user's " current behavior "; If D (n)≤λ is judged to abnormal behaviour with this user's " current behavior ";
Wherein second kind of decision method is to adopt variable window length that similarity stream is carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this monitored user's behavior is adjudicated again; Comprise following operating procedure:
(7311) read the parameter that is provided with in the control module: V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V), wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1);
(7312) when from
Figure C2005100569340006C5
In excavate n " behavior pattern sequence " Seq n *, and calculate Sim (Seq n *, L) after, continue to calculate Seq n *Corresponding similarity decision value D (n), and this user " current behavior " entered a judgement.
8, the unusual detection method of user behavior according to claim 7 is characterized in that: calculating and decision method that described step (7312) is carried out further comprise following operating procedure:
(step 1) is provided with variable k:=1;
(step 2) compares the same w of n (k): if n 〉=w (k) carries out subsequent step; If n<w (k) does not then calculate D (n), this user " current behavior " do not adjudicated yet, finish this operation;
(step 3) is calculated similarity average N, and (n, k), this numerical value is to similarity stream Z=(Sim (Seq 1 *, L), Sim (Seq 2 *, L) ..., Sim (Seq M *, L)) in Sim (Seq n *, L) be the individual similarity Sim (Seq of w (k) of terminal point N-w (k)+1 *, L), Sim (Seq N-w (k)+2 *, L) ..., Sim (Seq n *, L) carry out windowing and get average after obtain: N ( n , k ) = 1 w ( k ) &Sigma; m = n - w ( k ) + 1 n Sim ( Seq m * , L ) , W in the formula (k) is a window length, and w (k)≤n≤M;
(step 4) judge whether to satisfy judgment condition: N (n, k)>u (k), if satisfy this condition, then Seq n *Corresponding similarity decision value is defined as D (n) :=N, and (n k), and is judged to normal behaviour with this user's " current behavior "; If (n k)>u (k), continues to carry out subsequent operation not satisfy judgment condition N;
(step 5) judge whether to satisfy judgment condition: N (n, k)≤d (k), if satisfy this condition, then (n k), and is judged to abnormal behaviour with this user's " current behavior " to D (n) :=N, finishes the judgement to user's " current behavior "; If (n k)≤d (k), continues to carry out subsequent operation not satisfy judgment condition N;
(step 6) k:=k+1, promptly the value of k adds 1, and returns execution (step 2), and subsequent operation is carried out in circulation.
9, the unusual detection method of user behavior according to claim 1, it is characterized in that: described detection method is used for abnormality detection is carried out in the behavior of some validated users of computer network system, perhaps abnormality detection is carried out in the behavior of a group or a plurality of validated users in the network system, for the latter, can adopt two kinds of diverse ways:
If the authority and the behavioral characteristic of one group or a plurality of validated users differ bigger, then utilize the normal behaviour training data of each validated user to set up W sequence library respectively, utilize W sequence library separately that abnormality detection is carried out in each user's behavior more respectively;
If one group or a plurality of validated user have same rights and privileges, and behavioral characteristic is more approaching, then these users' training data is combined, the shell-command stream that is about to these users links together and constitutes total training data, utilize this training data to set up W sequence library, utilize this W sequence library that abnormality detection is carried out in each user's behavior again.
CNB2005100569348A 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study Active CN1333552C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100569348A CN1333552C (en) 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100569348A CN1333552C (en) 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study

Publications (2)

Publication Number Publication Date
CN1649311A CN1649311A (en) 2005-08-03
CN1333552C true CN1333552C (en) 2007-08-22

Family

ID=34876795

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100569348A Active CN1333552C (en) 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study

Country Status (1)

Country Link
CN (1) CN1333552C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424391A (en) * 2013-09-06 2015-03-18 联想(北京)有限公司 Method and device for automatic supervision

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100151817A1 (en) * 2007-02-26 2010-06-17 Lidstroem Mattias Method And Apparatus For Monitoring Client Behaviour
CN101136922B (en) * 2007-04-28 2011-04-13 华为技术有限公司 Service stream recognizing method, device and distributed refusal service attack defending method, system
CN101572691B (en) * 2008-04-30 2013-10-02 华为技术有限公司 Method, system and device for intrusion detection
US20100144440A1 (en) * 2008-12-04 2010-06-10 Nokia Corporation Methods, apparatuses, and computer program products in social services
CN101902366B (en) * 2009-05-27 2014-03-12 北京启明星辰信息技术股份有限公司 Method and system for detecting abnormal service behaviors
CN101702720B (en) * 2009-10-28 2012-09-05 中国科学院计算技术研究所 Model training method and detecting method in detection of impersonation attack
CN102402517A (en) * 2010-09-09 2012-04-04 北京启明星辰信息技术股份有限公司 Method and system for establishing normal database login model and method and system for detecting abnormal login behavior
CN102541899B (en) * 2010-12-23 2014-04-16 阿里巴巴集团控股有限公司 Information identification method and equipment
CN102413127A (en) * 2011-11-09 2012-04-11 中国电力科学研究院 Database generalization safety protection method
CN103581355A (en) * 2012-08-02 2014-02-12 北京千橡网景科技发展有限公司 Method and device for handling abnormal behaviors of user
CN103064870B (en) * 2012-09-24 2016-05-11 深圳市深信服电子科技有限公司 Method, device and the equipment of the anti-injection of Web
US9218729B2 (en) * 2013-02-20 2015-12-22 Honeywell International Inc. System and method of monitoring the video surveillance activities
US9286574B2 (en) * 2013-11-04 2016-03-15 Google Inc. Systems and methods for layered training in machine-learning architectures
CN103793484B (en) * 2014-01-17 2017-03-15 五八同城信息技术有限公司 The fraud identifying system based on machine learning in classification information website
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
US10296843B2 (en) * 2014-09-24 2019-05-21 C3 Iot, Inc. Systems and methods for utilizing machine learning to identify non-technical loss
CN104883346A (en) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 Network equipment behavior analysis method and system
CN104935600B (en) * 2015-06-19 2019-03-22 中国电子科技集团公司第五十四研究所 A kind of mobile ad-hoc network intrusion detection method and equipment based on deep learning
CN105959180A (en) * 2016-06-12 2016-09-21 乐视控股(北京)有限公司 Data detection method and device
CN106561026A (en) * 2016-07-29 2017-04-12 北京安天电子设备有限公司 Method and system for diagnosing invasion based on user account operation behavior
CN106230849B (en) * 2016-08-22 2019-04-19 中国科学院信息工程研究所 A kind of smart machine machine learning safety monitoring system based on user behavior
CN106789885B (en) * 2016-11-17 2021-11-16 国家电网公司 User abnormal behavior detection and analysis method under big data environment
CN106953766B (en) * 2017-03-31 2020-06-26 北京奇艺世纪科技有限公司 Alarm method and device
JP6716853B2 (en) * 2017-05-25 2020-07-01 日本電気株式会社 Information processing apparatus, control method, and program
US10419468B2 (en) * 2017-07-11 2019-09-17 The Boeing Company Cyber security system with adaptive machine learning features
CN108156146B (en) * 2017-12-19 2021-07-30 北京盖娅互娱网络科技股份有限公司 Method and device for identifying abnormal user operation
CN108234480B (en) * 2017-12-29 2021-06-22 北京奇虎科技有限公司 Intrusion detection method and device
CN108399700A (en) * 2018-01-31 2018-08-14 上海乐愚智能科技有限公司 Theft preventing method and smart machine
CN108509793A (en) * 2018-04-08 2018-09-07 北京明朝万达科技股份有限公司 A kind of user's anomaly detection method and device based on User action log data
CN108667818A (en) * 2018-04-20 2018-10-16 北京元心科技有限公司 The method of cloud device and cloud net end Collaborative Control access rights
CN108769026B (en) * 2018-05-31 2022-02-15 康键信息技术(深圳)有限公司 User account detection system and method
CN109639659A (en) * 2018-12-05 2019-04-16 四川长虹电器股份有限公司 A kind of implementation method of the WEB application firewall based on machine learning
WO2020225819A1 (en) * 2019-05-07 2020-11-12 B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University Methods and devices for detecting improper clinical programming of implantable medical devices
CN110519241A (en) * 2019-08-12 2019-11-29 广州海颐信息安全技术有限公司 The method and device for actively discovering privilege and threatening abnormal behaviour based on machine learning
CN111310186A (en) * 2020-03-17 2020-06-19 优刻得科技股份有限公司 Method, device and system for detecting confusion command line
CN113556338B (en) * 2021-07-20 2022-08-30 福建银数信息技术有限公司 Computer network security abnormal operation interception method
CN114036520B (en) * 2021-11-26 2024-09-24 安天科技集团股份有限公司 Application information evidence obtaining method and device, electronic equipment, computer readable storage medium and program product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003092603A (en) * 2001-09-17 2003-03-28 Toshiba Corp Network intrusion detecting system, apparatus and program
US20030083847A1 (en) * 2001-10-31 2003-05-01 Schertz Richard L. User interface for presenting data for an intrusion protection system
US20040167893A1 (en) * 2003-02-18 2004-08-26 Nec Corporation Detection of abnormal behavior using probabilistic distribution estimation
JP2004312083A (en) * 2003-04-02 2004-11-04 Kddi Corp Learning data generating apparatus, intrusion detection system, and its program
CN1555156A (en) * 2003-12-25 2004-12-15 上海交通大学 Self adaptive invasion detecting method based on self tissue mapping network
CN1588889A (en) * 2004-09-24 2005-03-02 清华大学 Abnormal detection method for user access activity in attached net storage device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003092603A (en) * 2001-09-17 2003-03-28 Toshiba Corp Network intrusion detecting system, apparatus and program
US20030083847A1 (en) * 2001-10-31 2003-05-01 Schertz Richard L. User interface for presenting data for an intrusion protection system
US20040167893A1 (en) * 2003-02-18 2004-08-26 Nec Corporation Detection of abnormal behavior using probabilistic distribution estimation
JP2004312083A (en) * 2003-04-02 2004-11-04 Kddi Corp Learning data generating apparatus, intrusion detection system, and its program
CN1555156A (en) * 2003-12-25 2004-12-15 上海交通大学 Self adaptive invasion detecting method based on self tissue mapping network
CN1588889A (en) * 2004-09-24 2005-03-02 清华大学 Abnormal detection method for user access activity in attached net storage device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种改进的IDS异常检测模型 孙宏伟等,计算机学报,第26卷第11期 2003 *
基于机器学习的入侵检测方法实验与分析 孙宏伟等,计算机工程与设计,第25卷第5期 2004 *
基于机器学习的入侵检测方法实验与分析 孙宏伟等,计算机工程与设计,第25卷第5期 2004;一种改进的IDS异常检测模型 孙宏伟等,计算机学报,第26卷第11期 2003 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424391A (en) * 2013-09-06 2015-03-18 联想(北京)有限公司 Method and device for automatic supervision

Also Published As

Publication number Publication date
CN1649311A (en) 2005-08-03

Similar Documents

Publication Publication Date Title
CN1333552C (en) Detecting system and method for user behaviour abnormal based on machine study
EP3803660B1 (en) Knowledge graph for real time industrial control system security event monitoring and management
Mirheidari et al. Alert correlation algorithms: A survey and taxonomy
Gao et al. Hmms (hidden markov models) based on anomaly intrusion detection method
Cuppens et al. Lambda: A language to model a database for detection of attacks
Ko Execution Monitoring of security-critical programs in a distributed system: a specification-based approach
US7530105B2 (en) Tactical and strategic attack detection and prediction
US5557742A (en) Method and system for detecting intrusion into and misuse of a data processing system
CN110321371A (en) Daily record data method for detecting abnormality, device, terminal and medium
CN111753303B (en) Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning
CN1333553C (en) Program grade invasion detecting system and method based on sequency mode evacuation
Ficco et al. A generic intrusion detection and diagnoser system based on complex event processing
CN109101815A (en) A kind of malware detection method and relevant device
Botha et al. The utilization of artificial intelligence in a hybrid intrusion detection system
Alserhani Alert correlation and aggregation techniques for reduction of security alerts and detection of multistage attack
CN110222243A (en) Determine the method, apparatus and storage medium of abnormal behaviour
Lanoe et al. A scalable and efficient correlation engine to detect multi-step attacks in distributed systems
CN118381627A (en) LLM driven industrial network intrusion detection method and response system
Liu et al. HMMs based masquerade detection for network security on with parallel computing
Doroudian et al. A hybrid approach for database intrusion detection at transaction and inter-transaction levels
CN117118857A (en) Knowledge graph-based network security threat management system and method
Nalavade et al. Finding frequent itemsets using apriori algorithm to detect intrusions in large dataset
CN105844176B (en) Security strategy generation method and equipment
Doroudian et al. Database intrusion detection system for detecting malicious behaviors in transaction and inter-transaction levels
CN117978476B (en) Attack chain generation method and device based on ATT &amp; CK knowledge graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: BEIJING CAPITEK CO, LTD.

Free format text: FORMER NAME: BEIJING SHOUXIN SCIENCE AND TECHNOLOGY CO., LTD.

CP03 Change of name, title or address

Address after: 100015 Beijing City, Chaoyang District Road No. 5

Patentee after: Beijing Capitek Co, Ltd.

Address before: 100016 Beijing city Chaoyang District Dongzhimen Road No. 5

Patentee before: Beijing Shouxin Science and Technology Co., Ltd.