CN1649311A - Detecting system and method for user behaviour abnormal based on machine study - Google Patents

Detecting system and method for user behaviour abnormal based on machine study Download PDF

Info

Publication number
CN1649311A
CN1649311A CN 200510056934 CN200510056934A CN1649311A CN 1649311 A CN1649311 A CN 1649311A CN 200510056934 CN200510056934 CN 200510056934 CN 200510056934 A CN200510056934 A CN 200510056934A CN 1649311 A CN1649311 A CN 1649311A
Authority
CN
China
Prior art keywords
shell
sequence
seq
command
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200510056934
Other languages
Chinese (zh)
Other versions
CN1333552C (en
Inventor
田新广
隋进国
李学春
王辉柏
邹涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Capitek Co, Ltd.
Original Assignee
BEIJING SHOUXIN SCIENCE AND TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHOUXIN SCIENCE AND TECHNOLOGY Co Ltd filed Critical BEIJING SHOUXIN SCIENCE AND TECHNOLOGY Co Ltd
Priority to CNB2005100569348A priority Critical patent/CN1333552C/en
Publication of CN1649311A publication Critical patent/CN1649311A/en
Application granted granted Critical
Publication of CN1333552C publication Critical patent/CN1333552C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

A user behavior abnormal test system and a method based on machine study is composed of a control module, a data acquire and pre-process module, a study module. A sequence storing module, a test module, and a test result output module installed on needed monitored server taking the shell order on Unix data. After pre-processing data, a device study module is used to set up normal behavior profiles of key legal users in the network system to compare its current behavior and its normal behavior profile to identify abnormal behaviors, namely, to see if it is intruded. If its current behavior deviates from the normal profile, abnormal things happen, maybe the key legal user carries out un-authorized operation or foreign illegally using the account number of the legal user.

Description

User behavior abnormality detection system and method based on machine learning
Technical field
The present invention relates to a kind of user behavior abnormality detection system and method that is used for computer network security, belong to the network information security technology field based on machine learning.
Background technology
In recent years, along with the continuous expansion of applications of computer network scope, all kinds of attacks and the destructive activity of network grown with each passing day, the harm that is caused is also increasing; Tens billion of dollars of global every year because of the destroyed economic loss that causes of the safety system of computer network reaches.At present, network security has become the key of national information industry development, also is simultaneously the important component part of country and national defense safety.Attack is detected and takes precautions against, ensure that the safety of computer system, network system and whole information infrastructure has become instant important topic.Intrusion detection is a kind of network information security technology that is used for detection computations machine network system intrusion behavior, it mainly passes through the operating position of state, behavior and the system of supervisory control comuter network system, come going beyond one's commission of detection system user to use and the misuse behavior, and the invador of system outside utilize the attack that safety defect carried out of system.Intruding detection system (IDS, Intrusion DetectionSystem) the various safety management abilities that comprise security audit, supervision, attack identification and responding ability that can the expanding system keeper, be considered to " fire compartment wall " second road safety gate afterwards, in network information security system, occupy critical role.
According to the source of Audit data and the object of being monitored, intruding detection system can be divided into main frame type, network-type and hybrid system.The Audit data that main frame type intruding detection system is used is mainly from record of the audit, system journal and the application log of operating system, and such system protection object is individual server normally.The information source of network-type intruding detection system then is the raw data packets on the network, and this type systematic is being undertaken the task of protecting a network segment usually.The mixed type intruding detection system can be analyzed simultaneously from the Audit data of server and the packet on the network, and system is made from multiple components, and generally adopts distributed frame.
Main frame type and network-type intruding detection system are in each tool advantage of different detection ranges, and there is complementarity in both.Main frame type intruding detection system major advantage has: (1) is insensitive to network traffics, generally can not influence the supervision to system action because of the increase of network traffics; That (2) detects is with strong points, detects fine size, and some activities of supervisory control system at an easy rate are for example at the activity of sensitive document, catalogue, program or port; (3) flexible configuration does not need extra hardware, can customize targetedly according to the actual conditions of protected system, can utilize the function of operating system itself and in conjunction with anomaly analysis, detect attack more accurately simultaneously; (4) attack of carrying out utilizing operating system leak or application software defective has good strick precaution effect; (5) can be used for encrypting or adopting the network environment of exchanging mechanism.The network-type intruding detection system generally is placed in the important network segment, and its advantage mainly contains: (1) is applicable to the attack that detects agreement Network Based; (2) with the operating system independent of server, applied widely, favorable expandability; (3) generally obtain data by the mode of network monitoring, thus very little to the performance impact of protected network, and do not need to change network configuration; And the network monitoring device is transparent to the user in the network, has reduced detection system itself and has suffered the possibility of intruder attack.
At present, Intrusion Detection Technique mainly is divided three classes: misuse detection, abnormality detection and mixing detect.Misuse detects by invasion (attack) behavior being analyzed and being represented to detect invasion (among the present invention, with " invasion " and " attack " as the synonym use); This method generally is that intrusion behavior is expressed as a kind of pattern or feature, and set up intrusion model (feature) storehouse according to known intrusion behavior and system defect, during detection monitored system or user's agenda pattern is mated with intrusion model, judge whether to exist invasion according to matching result.Misuse detects has very strong detectability to known invasion, and its shortcoming is that the pattern storehouse needs to bring in constant renewal in, and is difficult to detect unknown invasion.Abnormality detection is that system or user's normal behaviour (profile) is analyzed and represented, when monitored system or user's agenda and its normal behaviour when there is some difference, promptly thinking has invasion to exist.The advantage of abnormality detection is the knowledge that does not need too much relevant system defect, has stronger adaptability, can detect unknown invasion or emerging intrusion model.Mix to detect be will misuse detects and abnormality detection combines detection technique, have better detection performance usually.
Along with to the deepening continuously of computer network weakness and Attack Research, the application of misuse detection technique more and more widely, at present, commercial network-type intruding detection system adopts this technology mostly.The key that misuse detects is how intrusion behavior to be represented and upgraded, and the speed and the efficient that how to improve message capturing and pattern matching.Because new attack type and network hole constantly occur, the intrusion model in the actual misuse detection system (feature) storehouse often can not in time obtain replenishing and upgrading, and this is the main cause that causes system to fail to report.The abnormality detection technology has more application in main frame type intruding detection system, in the network-type intruding detection system then usually as replenishing of detecting of misuse the anomaly analysis of network traffics (for example to).The key problem of abnormality detection is how to represent system or user's normal behaviour (profile), and how system or user's agenda and its normal behaviour is compared.For abnormality detection, for the normal behaviour (to reduce false alarm probability) of representing system or user comprehensively, exactly, usually need be with a large amount of, comparatively complete training data to the detection model training.But, to compare with the misuse detection, abnormality detection has advantage in many aspects, and the ability that detects unknown attack is particularly arranged.As a kind of Intrusion Detection Technique that good development prospect is arranged, abnormality detection is more and more studied and is used.
The user behavior abnormality detection system that the present invention relates to is a kind of main frame type intruding detection system, and this system has adopted the abnormality detection technology based on machine learning.Machine learning is meant and utilizes machine (computer) learning knowledge and deal with problems, belongs to the intercrossing subject.The application study of machine learning mainly is various learning models of development and learning method, and makes up the learning system of the oriented mission with application-specific on this basis.
Universal model referring to machine learning system shown in Figure 1.A machine learning system mainly is made up of unit, knowledge base and performance element.Wherein unit is the core of system, and the information that it utilizes the external information source to provide is obtained knowledge and it is made improvement (for example reorganizing existing knowledge); The input of unit has two kinds: external environment information and execute the task after feedback information.Different learning system adopts different experience case representations, the simplest a kind of be that binary feature is represented, only whether the existence of some attribute of description object to be, the general input of using this binary feature of connectionist learning and genetic learning method.Another kind is that property value is represented, each attribute has one group of value of repelling mutually, can be redness, blueness and yellow etc. as the value of color attribute, and it is in the inductive learning method that the typical case that this property value is represented uses.Also have a kind of more complicated be that relation or structure represent that it describes the relation between two or more objects, this relation or structural information generally are to represent with forms such as predicate logic, semantic networks; The same two kinds of expressions are compared, and this expression has stronger expression ability, but have also brought suitable complexity for the matching process in the study simultaneously.Knowledge base is used for stored knowledge, and the knowledge that it is stored comprises domain knowledge (this knowledge generally is metastable), and the various new knowledges (this knowledge is time dependent in some cases) that obtain by study.The design of selecting which kind of knowledge to store learning system plays a part very key, and the system that has only stores concrete single experience example, and some system then stores the abstract popularization that obtains from these examples.If there are two kinds of differences again in the latter: represent knowledge with logic, discrete form, perhaps represent knowledge with numerical value, continuous form.Inductive learning and analytic learning often use logic, discrete representation, and connectionist learning then mainly uses representation numerical value, continuous.Performance element utilizes the knowledge in the knowledge base to execute the task, and the information after task is carried out feeds back to unit again as the further input of study, and this unit is to make learning system have practical use, can estimate the key component of learning method quality simultaneously again.
Machine learning techniques can be classified from different angles.According to the synthesized attribute of study, machine learning techniques can be divided into inductive learning, analytic learning, instance-based learning, connectionist learning etc.Inductive learning is under the given a series of known positive example and the condition of counter-example about certain notion, obtains the process that the generality of this notion is described by induction; The decision tree learning algorithm is the inductive learning algorithm of widely using at present, and typical decision tree learning algorithm has CLS algorithm, ID3 algorithm etc.Analytic learning is to utilize priori and enlarge the information that training examples provides by deduction; In analytic learning, the input of learning system also comprises the field theory except training examples and hypothesis space, and it is by can be used for explaining that the priori of training examples forms.Instance-based learning need store training examples, and extensive work is postponed till when analyzing new example; Generally, when the instance-based learning system runs into new example, it will analyze the relation of the example of new example and former storage, and in view of the above a target function value be composed to new example; The advantage of this technology is a property ground estimation objective function once on whole instance space not, but makes partial estimation at each new example to be analyzed; A deficiency of this technology is that the required amount of calculation of the new example of analysis may be bigger; So, suitably reduce the quantity of training examples, and index training examples effectively, the amount of calculation when analyzing new example to reduce is a major issue.Abnormality detection system provided by the present invention has adopted the technology of instance-based learning.
In an actual calculation machine network system, a plurality of validated users are arranged all generally; These validated users have different operating right (for example, the main activities of programmer in system is programming, and do not allow some operation in the executive system administrator right) usually; And different validated users has different behavioral characteristics and behavior rule.Safety for computer network system, all must the behavior of some crucial validated users in the system be monitored under many circumstances, to prevent that these crucial validated users from carrying out unauthorized operation, prevent that perhaps the account number that outside invasion person (disabled user) falsely uses these crucial validated users from carrying out illegal operation.
Summary of the invention
In view of this, the purpose of this invention is to provide a kind of user behavior abnormality detection system and method based on machine learning.This system utilizes machine learning model to set up the normal behaviour profile of (or one group) crucial validated user in the computer network system, and current behavior and its normal behaviour profile by more crucial validated user discerned abnormal behaviour in detection; If this user's current behavior has departed from its historical normal behaviour profile largely, promptly think taken place unusual: may be that crucial validated user has carried out unauthorized operation, or the account number that outside invasion person falsely uses crucial validated user have been carried out illegal operation.Though might not mean attack unusually, should cause safety officer's close attention at least.
In order to achieve the above object, the invention provides a kind of user behavior abnormality detection system based on machine learning, this system configuration is on the server of needs monitoring, adopt user interface shell-command on the Unix platform as Audit data, detect in the server in the behavior of user interface layer analysis user and whether invade; Its technical scheme is that described system includes:
Control module is responsible for the operating state and the various detected parameters of the system that is provided with, and data is obtained with the operation of pretreatment module, study module, detection module and whole system control;
Data are obtained and pretreatment module, be responsible for from server, obtaining original training data or Audit data, it is the shell-command line data that the user carries out, and after these original training datas or Audit data being processed into the form of shell-command stream, send into study module or detection module respectively, be used for study or detection;
Study module adopts machine learning techniques, obtains the knowledge of certain validated user normal behaviour in the network system from training data, and sets up the shell-command sequence library of the normal behaviour profile that is used to represent this validated user on its basis;
The sequence memory module is used to store the shell-command sequence library that study module is set up, and when detecting, retrieves comparison for detection module;
Detection module, the shell-command Audit data of being responsible for described validated user was carried out in the monitored time carries out analyzing and processing, finishes excavation, similarity calculating or the assignment that includes but not limited to " behavior pattern sequence ", the work that the windowing filter is made an uproar, decision value calculates and user behavior is adjudicated of similarity;
The testing result output module is responsible for showing the decision value curve of detection module generation, and under the control of detection module abnormal behaviour is reported to the police.
In order to achieve the above object, the present invention provides the detection method of this user behavior abnormality detection system again, comprises following operating procedure:
(1) system start-up;
(2) during the input of system wait instruction, the operating state and the detected parameters of system are set by control module, so that after input " starting working " instruction after this, automatically check the situation that is provided with of system by control module, enter two kinds of different operating states respectively:, carry out subsequent operation if system is set to learning state; If system is set to detected state, then redirect execution in step (6);
(3) under control module drove, data were obtained with pretreatment module and are written into original training data from predefined data-interface, and this original training data is carried out preliminary treatment, made it become the form of shell-command stream, exported it to study module again;
(4) study module utilizes pretreated shell-command stream training data to learn, and sets up the shell-command sequence library, and after depositing this sequence library in the sequence memory module, sends the message of " study finishes " to control module;
(5) after control module receives study module " study finishes " message, make system return step (2), wait for and import the new instruction that is provided with; Perhaps direct operating state with system transfers detected state to, carries out subsequent operation;
(6) under control module drives, data are obtained with pretreatment module and are written into the capable original Audit data of shell-command in real time from predefined data-interface, simultaneously this original Audit data is carried out real-time preliminary treatment, and export pretreated shell-command stream Audit data to detection module in real time;
(7) detection module carries out real-time analysis to this Audit data, generates testing result;
(8) the testing result output module shows testing result: the decision value curve, and abnormal behaviour carried out Realtime Alerts.
In the described step (2), if the operating state of system is set to learning state, the running parameter that needs to be provided with comprises:
Be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
If the operating state of system is set to detected state, the running parameter that needs to be provided with has two kinds, is respectively:
First kind of decision method, window length w, decision threshold λ; Or
Second kind of decision method, the number V of window length, V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V); Wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1).
Data are obtained with pretreatment module the preliminary treatment that original training data or original Audit data carry out are comprised the following steps: in described step (3) or the step (6)
31, extract title names, mark flags and the metacharacter metacharacters that shell-command is ordered in capable;
32, will include but not limited to that the information of filename, server name, catalogue, network address replaces with the identifier<n of consolidation form 〉, wherein n represents the number of filename, server name, catalogue or network address;
33, on the time point that each shell session begins and finishes, insert the identifier SOF and the EOF of expression starting and ending respectively;
34, the shell-command symbol that will comprise the message identification symbol of the symbol of title, mark and metacharacter of order and filename, server name, catalogue, network address is arranged according to the appearance order in the shell session; And connect the order symbol of different shell sessions according to time sequencing, and in above-mentioned data, do not add timestamp, through after this preliminary treatment, original input data becomes shell-command stream in form: a string shell-command symbol of arranging in chronological order, and shell-command stream can comprise the content of a plurality of shell sessions.
The learning manipulation that study module utilizes pretreated shell-command stream training data to carry out in the described step (4) comprises the following steps:
41, obtain the training data that obtains passing through pretreated, as to represent this validated user normal behaviour with pretreatment module from data: R=(s 1, s 2..., s r), i.e. length be r shell-command stream, wherein s jJ the shell-command symbol that expression is arranged in chronological order;
42, from control module, read learning parameter: be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
43, by shell-command stream R generate W sequence length be respectively l (1), l (2) ..., the shell-command sequence flows of l (W): S 1, S 2..., S W, S wherein iBe that sequence length is the shell-command sequence flows of l (i): S i = ( Seq 1 i , Se q 2 i , . . . . . . , Se q r - l ( i ) + 1 i ) , In the formula Se q j i = ( s j , s j + 1 , . . . . . . , s j + l ( i ) - 1 ) , It is S iIn j the shell-command sequence of arranging in chronological order, i is the natural number in the interval [1, W];
44, calculate shell-command sequence flows S iIn each shell-command sequence at S iIn the frequency of occurrences, wherein i is the natural number in interval [1, W]; Be sequence of calculation stream S iIn each shell sequence at S iIn occurrence number divided by each sequence occurrence number sum in this sequence flows;
45, read frequency threshold parameter in the control module: the frequency threshold η that is used to set up W shell-command sequence library 1, η 2..., η Wη wherein iBe employed frequency threshold when to set up sequence length be the shell-command sequence library of l (i), i is the natural number in interval [1, W];
46, according to the frequency of occurrences of sequence, from W shell-command sequence flows S 1, S 2..., S WIn extract several shell-command sequences respectively as sample, set up W sequence library; Its concrete grammar step is:
If W is used to represent the set L={L (1) of the sequence library of this user's normal behaviour profile, L (2) ..., L (W) }, wherein L (i) expression is the sequence library that the sequence of l (i) is formed by length;
According to the order of natural number i from 1 to W, respectively with shell-command sequence flows S iThe middle frequency of occurrences is more than or equal to frequency threshold η iThe shell-command sequence extract as sample, promptly be considered as the shell-command sequence of the normal behaviour pattern of this validated user, and these arrangement sets constituted sequence library L (i) together.
The real-time detecting operation that detection module carries out real-time analysis to Audit data and generates testing result in the described step (7) comprises the following steps:
71, obtain with pretreatment module from data and obtain Audit data in real time; Promptly when detecting, data obtain that will to obtain the shell-command that this monitored validated user carries out in the monitored time from the shell history file in real time capable with pretreatment module, and after these command-line datas are carried out preliminary treatment, be transformed to a shell-command stream: R={ s 1, s 2..., s r, s wherein jJ the shell-command symbol that expression is arranged in chronological order, r is the length of this command stream; Data are obtained with pretreatment module according to time sequencing in real time with R={ s 1, s 2..., s rIn each shell-command symbol export detection module successively to;
72, detection module utilizes the sequences match method to excavate " behavior pattern sequence " among the shell-command stream R, and according to the length computation of each " behavior pattern sequence " it and sequence library set L={L (1), L (2), ..., L (W) } similarity, " the behavior pattern sequence " of being arranged in chronological order stream P = ( Se q 1 * , Se q 2 * , . . . . . . , Se q M * ) , And corresponding similarity stream Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * , L ) , . . . . . . , Sim ( Se q M * , L ) ) , Seq wherein n *The n that expression is excavated from R " behavior pattern sequence ", Sim (Seq n *, L) expression Seq n *With the similarity of sequence library set, M excavates the number of " behavior pattern sequence " from R, and int (r/l (W))≤M≤r-l (W)+1;
73, similarity is flowed Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * , L ) , . . . . . . , Sim ( Se q M * , L ) ) Carry out windowing and get average, with the similarity average that obtains, promptly similarity decision value and decision threshold compare, and then the behavior of this monitored validated user is entered a judgement;
When detecting in real time, described three steps 71, the 72, the 73rd are carried out synchronously.
Detection module utilizes the sequences match method to excavate the calculation of similarity degree that behavior pattern sequence and each behavior pattern sequence and sequence library are gathered among described user's shell-command stream Audit data R further to comprise the following steps: in the described step 72
721, three variable: j:=1, i:=W, n:=1 are set;
If 722 j≤r-l (W)+1 are with Seq j iCompare with sequence library L (i), execution in step 723 again; If j>r-l (W)+1, shut-down operation promptly finishes the excavation and the similarity of behavior mode sequences and calculates;
If 723 S ‾ e q j i ∈ L ( i ) , Be Seq j iIdentical with certain sequence among the sequence library L (i), n behavior pattern sequence then Se q n * : = S ‾ e q j i , Seq n *With the similarity of sequence library set L be: Sim ( Se q n * , L ) : = 2 l ( i ) / 2 l ( W ) , J:=j+l (i), i:=W, n:=n+1, and return execution in step 722; If S ‾ e q j i ∉ L ( i ) , Be Seq j iAll inequality with any sequence among the sequence library L (i), i:=i-1 then, execution in step 724 then;
If execution in step 722 is returned in 724 i ≠ 0; If i=0, then Se q n * : = ( S j ) , Sim ( Se q n * , L ) : = 0 , J:=j+1, i:=W, n:=n+1, and return execution in step 722.
Detection module has two kinds of decision methods to select for the user when similarity stream being carried out windowing process and this user's behavior adjudicated in the described step 73;
Wherein first kind of decision method is with fixing window length similarity stream to be carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this user's behavior is adjudicated again; Comprise following operating procedure:
7301, read the parameter that is provided with in the control module: window length w and decision threshold λ, when from Audit data R, excavating n behavior pattern sequence Seq n *, and calculate Sim (Seq n *, L) after, n 〉=w wherein; With Sim (Seq n *, L), calculate with Sim (Seq for terminal point carries out windowing to similarity stream Z n *, L) be w similarity of terminal point, i.e. Sim (Seq N-w+1 *, L), Sim (Seq N-w+2 *, L) ..., Sim (Seq n *, average L) obtains Seq n *Corresponding similarity decision value D (n): D ( n ) = 1 w Σ m = n - w + 1 n Sim ( Se q m * , L ) ;
7302, utilize decision value D (n) and decision threshold λ that this user " current behavior " adjudicated; If D (n)>λ is judged to normal behaviour with this user's " current behavior "; If D (n)≤λ is judged to abnormal behaviour with this user's " current behavior ";
Wherein second kind of decision method is to adopt variable window length that similarity stream is carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this monitored user's behavior is adjudicated again; Comprise following operating procedure:
7311, read the parameter that is provided with in the control module: V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V), wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1);
7312, when from R, excavating n " behavior pattern sequence " Seq n *, and calculate Sim (Seq n *, L) after, continue to calculate Seq n *Corresponding similarity decision value D (n), and this user " current behavior " entered a judgement.
Calculating that described step 7312 is carried out and decision method further comprise following operating procedure:
Step 1, variable k:=1 is set;
Step 2, the same w of n (k) is compared: if n 〉=w (k) carries out subsequent step; If n<w (k) does not then calculate D (n), this user " current behavior " do not adjudicated yet, finish this operation;
Step 3: (n, k), this numerical value is that similarity is flowed to calculate similarity average N Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * L ) , . . . . . . , Sim ( Se q M * , L ) ) In with Sim (Seq n *, L) be the individual similarity Sim (Seq of w (k) of terminal point N-w (k)+1 *, L), Sim (Seq N-w (k)+2 *, L) ..., Sim (Seq n *, L) carry out windowing and get average after obtain: N ( n , k ) = 1 w ( k ) Σ m = n - w ( k ) + 1 n Sim ( Se q m * , L ) , W in the formula (k) is a window length, and w (k)≤n≤M;
Step 4, judge whether to satisfy judgment condition: N (n, k)>u (k), if satisfy this condition, then Seq n *Corresponding similarity decision value is defined as D (n) :=N, and (n k), and is judged to normal behaviour with this user's " current behavior "; If (n k)>u (k), continues to carry out subsequent operation not satisfy judgment condition N;
Step 5, judge whether to satisfy judgment condition: N (n, k)≤d (k), if satisfy this condition, then (n k), and is judged to abnormal behaviour with this user's " current behavior " to D (n) :=N, finishes the judgement to user's " current behavior "; If (n k)≤d (k), continues to carry out subsequent operation not satisfy judgment condition N.
Step 6, k:=k+1, promptly the value of k adds 1, and returns execution in step 2, and subsequent operation is carried out in circulation.
Described detection method is used for abnormality detection is carried out in the behavior of some validated users of computer network system, perhaps abnormality detection is carried out in the behavior of a group or a plurality of validated users in the network system, for the latter, can adopt two kinds of diverse ways:
If the authority and the behavioral characteristic of one group or a plurality of validated users differ bigger, then utilize the normal behaviour training data of each validated user to set up W sequence library respectively, utilize W sequence library separately that abnormality detection is carried out in each user's behavior more respectively;
If one group or a plurality of validated user have same rights and privileges, and behavioral characteristic is more approaching, then these users' training data is combined, the shell-command stream that is about to these users links together and constitutes total training data, utilize this training data to set up W sequence library, utilize this W sequence library that abnormality detection is carried out in each user's behavior again.
The present invention is a kind of user behavior abnormality detection system and method based on machine learning, and its advantage is:
(1) system of the present invention has very strong practicality and operability.This system adopts software to form, can flexible configuration on the webserver of needs monitoring, do not need any hardware of additional configuration, just can detect the user's abnormal behaviour in the webserver, and then make the safety officer discern various attack activity network system.Compare with more existing commercial main frame type intruding detection systems, system of the present invention utilizes the different shell-command sequence of multiple length to represent the various normal behaviour patterns of validated user, and set up the normal behaviour profile that a plurality of sequence libraries are described the user, improved user behavior pattern and behavior profile flexibility and the accuracy in representing.Testing result in the practical application shows that this system has very high detection accuracy rate.
(2) detection method of the present invention is based on machine learning techniques, and in a lot of main frame type intruding detection systems, the common existing positive example of training examples that learning phase (training stage) is adopted has counter-example again.System of the present invention only needs positive example when study, do not need counter-example, thereby greatly reduces the difficulty that training data is collected, and has expanded the range of application and the field of system.In addition, the present invention has adopted unique similarity to calculate (assignment) method; In the present invention, the normal behaviour pattern of validated user is considered to the frequent shell-command sequence of carrying out in its course of normal operation, and therefore, system extracts the sample sequence according to the frequency of occurrences of shell-command sequence in the training data.Testing result in experimental practical application shows that the method for this extraction sample sequence is a kind of very sane method.
(3) detect in the decision method at second kind of the present invention, similarity stream is carried out " variable window length " has been introduced in the windowing filter when making an uproar technological means, and unite and adopt a plurality of decision thresholds that monitored user's behavior is adjudicated, strengthened and detected the stability of performance and the real-time of detection.
(4) detection method of the present invention has adopted the matching way of " complete sequence relatively " when carrying out " behavior pattern sequence " excavation.Therefore, when sequence storage and coupling, can utilize different numbering (integer) to substitute each mutually different shell-command sequence.Compare with some existing main frame type intruding detection systems and detection method, sequences match of the present invention and storage means can reduce the operand in memory data output and the detection greatly, thereby reduce consumption and influence to the resource of the server that system settled.
Description of drawings
Fig. 1 is the universal model figure of machine learning system.
Fig. 2 is the structural representation that the present invention is based on the user behavior abnormality detection system of machine learning.
Fig. 3 is the workflow block diagram of user behavior abnormality detection system of the present invention.
Fig. 4 is the detection method schematic diagram that the present invention is based on the user behavior abnormality detection system of machine learning.
Fig. 5 is the step block diagram that the study module among the present invention is learnt.
Fig. 6 is the step block diagram that the detection module among the present invention detects.
Fig. 7 is the decision value curve chart of the testing result output module output among the present invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with accompanying drawing.
Referring to Fig. 2, the present invention is a kind of user behavior abnormality detection system based on machine learning, this system is a software product, be configured on the server that needs monitoring, adopt shell-command on the Unix platform as Audit data, detect in the server in the behavior of user interface layer analysis user and whether invade.This system is obtained with pretreatment module, study module, sequence memory module, detection module, testing result output module by control module, data and forms.
Control module in the system is responsible for the operating state and the various detected parameters of the system that is provided with, and data is obtained with the operation of pretreatment module, study module, detection module and whole system control.Data are obtained with pretreatment module and be responsible for obtaining original training data or Audit data from servers, it is the shell-command line data that the user carries out, and after these original training datas or Audit data being processed into the form of shell-command stream, send into study module or detection module respectively, be used for study or detection.Study module utilizes machine learning techniques, obtains the knowledge of the normal behaviour of the crucial validated user of in the network system certain (certain group) from training data, and sets up the shell-command sequence library of the normal behaviour profile that is used to represent this validated user on this basis.The sequence memory module is used to store the shell-command sequence library that study module is set up; When detecting, this shell-command sequence library can be retrieved comparison for detection module.Detection module is responsible for this validated user performed shell-command in the monitored time is analyzed and handled, and finishes work such as the excavation of " behavior pattern sequence ", windowing filter that similarity is calculated (assignment), similarity are made an uproar, decision value calculating, user behavior judgement.The testing result output module is responsible for showing the decision value curve of detection module generation, and under the driving of detection module abnormal behaviour is reported to the police.
Referring to Fig. 3, the workflow of introducing system of the present invention is as follows:
(1) system start-up;
(2) input of system wait instruction; At this moment, the operating state and the detected parameters of system are set, after setting completed, can import the instruction of " starting working " by control module; Automatically check the situation that is provided with of system again by system control module, enter two kinds of different operating states respectively:, carry out subsequent operation if system is set to learning state; If system is set to detected state, then redirect execution in step (6); Need to prove that one group of default operating state and detected parameters are arranged after the system start-up, set operating state and detected parameters during promptly last operation; If do not need to change above default setting, the instruction that then can directly import " starting working " makes system carry out corresponding step;
(3) under control module drove, data were obtained with pretreatment module and are written into original training data from predefined data-interface, and this original training data is carried out preliminary treatment, made it become the form of shell-command stream, exported it to study module again;
(4) study module utilizes pretreated shell-command stream training data to learn, and sets up the shell-command sequence library, and after depositing this sequence library in the sequence memory module, sends the message of " study finishes " to control module;
(5) after control module receives study module " study finishes " message, make system return step (2), wait for and import the new instruction that is provided with;
(6) under control module drives, directly the operating state with system switches to detected state, data are obtained with pretreatment module and are written into the capable original Audit data of shell-command in real time from predefined data-interface, simultaneously this original Audit data is carried out real-time preliminary treatment, and export pretreated shell-command stream Audit data to detection module in real time;
(7) detection module obtains this Audit data that is provided with pretreatment module to data and carries out real-time analysis, generates testing result;
(8) the testing result output module shows testing result: the decision value curve, and abnormal behaviour carried out Realtime Alerts.
By the workflow of the invention described above system as seen, the present invention mainly comprises three steps (referring to Fig. 3) as a kind of user behavior method for detecting abnormality based on machine learning: obtain data and it is carried out preliminary treatment, learns or train, detects user behavior and exports testing result.Below this three parts work is specifically introduced respectively.
(1) obtain data and it is carried out preliminary treatment: system of the present invention all needs to obtain original training data or Audit data in study with when detecting, and it is carried out preliminary treatment, and this work is obtained with pretreatment module by data and finished.
The present invention adopts the shell-command that the user carries out on the Unix platform capable of original Audit data.Its reason mainly contains three: (1) compares the capable behavior that can more directly reflect the user of shell-command with other Audit data (as CPU use amount, memory usage etc.); (2) on the Unix platform, shell is topmost interface between terminal use and the operating system, and the User Activity of significant proportion all utilizes shell to finish; (3) the capable ratio of shell-command is easier to collect, and also is convenient to analyze.
Shell on the Unix platform has polytype, as tcsh, ksh, bash.System of the present invention is applied to tcsh; Tcsh is the command interpreter with similar C grammer, and its history mechanism is can be with the shell-command of user input capable to be put into historical inventory and preserve.Because order input mode and the history mechanism of dissimilar shell have a lot of general character, thereby user behavior method for detecting abnormality of the present invention also is applicable to (comprising data preprocessing method, learning method and detection method) shell of other type outside the tcsh.
The present invention's capable original input data of needed shell-command when study and detection can obtain from the history file of tcsh.But shell-command is capable can not to be directly used in study or detection, but need carry out preliminary treatment.Pretreated purpose mainly contains two: (1) makes data be convenient to storage in form, analyze and handle; (2) reduce the number of mutually different order symbol in the Audit data.In study with when detecting, data obtain that with pretreatment module original input data to be carried out pretreated method be identical.Concrete grammar is:
1, extracts title (names), mark (flags) and the metacharacter (metacharacters) that shell-command is ordered in capable.
2, information such as filename, server name, catalogue, network address are replaced with the identifier<n of consolidation form 〉, wherein n represents the number of filename, server name, catalogue or network address.
3, on the time point that each shell session begins and finishes, insert the identifier SOF and the EOF of expression starting and ending respectively.
4, the shell-command symbol (shell command tokens) that will comprise the identifier of information such as the symbol of command name, mark and metacharacter and filename, server name is arranged according to the appearance order in the shell session; And connect the order symbol of different shell sessions, and in above-mentioned data, do not add timestamp according to time sequencing.
Original input data becomes shell-command stream in form through after this preliminary treatment: a string shell-command symbol of arranging in chronological order, and shell-command stream can comprise the content of a plurality of shell sessions.For example, the order line of certain user adjacent shell session on two times of carrying out on the tcsh:
#Start session 1 cd~/games/ xquake ﹠amp; Fg vi scores.txt mailx john_doe@somewhere.com exit #End session 1 He
#Start?session?2
cd~/private/docs
ls-laF|more?cat?foo.txt?bar.txt?zorch.txt>somewhere?exit?#?End?session?2
After preliminary treatment, become following shell-command stream:
(SOF,cd,<1>,xquake,&,fg,vi,<1>,mailx,<1>,exit,EOF,SOF,cd,<1>,ls,-laF,|,more,cat,<3>,>,<1>,exit,EOF)
Wherein<1 〉,<3 are identifiers of contents such as filename, catalogue, Email address, SOF and EOF are respectively the identifiers that session begins and finishes.As seen, shell-command stream is a string shell-command symbol of arranging in chronological order; A shell-command stream can comprise the content of a plurality of shell sessions.
(2) learn or train: system carry out detect before, at first need to learn (training), promptly from training data, obtain the knowledge of the normal behaviour of certain crucial validated user in the network system, and on this basis, set up the shell-command sequence library of the normal behaviour profile be used to represent this user, when detecting, retrieve comparison for detection module.The study and work of system is finished by study module.
Referring to Fig. 5, the learning procedure of introducing study module is as follows:
(1) obtain the normal behaviour training data that obtains this validated user with pretreatment module from data: original normal behaviour training data is that the shell-command carried out during normal running in history of this validated user is capable, and these command-line datas are through becoming a shell-command stream (this command stream has been represented the historical normal behaviour of this validated user) after preliminary treatment: R=(s 1, s 2... .., s r), it is the shell-command stream that a length is r, wherein s jJ the shell-command symbol that expression is arranged in chronological order.
(2) from control module, read learning parameter: be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W).
This system represents the various actions pattern of this validated user in when study with the different shell-command sequence of W kind length, and the various normal behaviour patterns in the training data (being the higher shell-command sequence of the frequency of occurrences) are gathered together constitutes this user's normal behaviour profile.Under the situation that W determines, l (1), l (2) ..., l (W) can have different selections.For example during W=3, l (1), l (2), l (3) can be respectively 1,2,3 (promptly the length of sequence is respectively 1,2,3 in 3 sequence libraries), also can be respectively 3,6,9 or other combination.W and l (i) have a direct impact detecting performance and detection efficiency, and W and l (i) are big more, and the operand in the memory data output of system and the detection also can be big more.
(3) by shell-command stream R generate W sequence length be respectively l (1), l (2) ..., the shell-command sequence flows of l (W): S 1, S 2..., S W, S wherein iBe that sequence length is the shell-command sequence flows of l (i): S i = ( Se q 1 i , Se q 2 i , . . . . . . , Se q r - l ( i ) + 1 i ) , In the formula Se q j i = ( s j , s j + 1 , . . . . . . , s j + l ( i ) - 1 ) , It is S iIn j the shell-command sequence of arranging in chronological order, i is the natural number in the interval [1, W].
Illustrate the generative process of shell-command sequence flows below.
W=3 for example, l (1), l (2), l (3) are respectively 1,2,3, R=(s 1, s 2..., s 12)=( *SOF *, cd,<1 〉, xquake , ﹠amp; , fg, vi,<1 〉, mailx,<1 〉, exit, *EOF *), then can formation sequence length be respectively 3 shell-command sequence flows S of 1,2,3 by R 1, S 2, S 3, wherein
Figure A20051005693400253
( < 1 > ) , ( mailx ) , ( < 1 > ) , ( exit ) , ( EOF * * * * ) )
S 2 = ( Se q 1 2 , Se q 2 2 , . . . . . . , Se q 11 2 ) = ( ( SOF * * * * , cd ) , ( cd , < 1 > ) , ( < 1 > , xquake ) ,
Figure A20051005693400256
EOF * * * * ) )
S 3 = ( Se q 1 3 , Se q 2 3 , . . . . . . , Se q 10 3 ) = ( ( SOF * * * * , cd , < 1 > ) , ( cd , < 1 > , xquake ) , ( < 1 > ,
Figure A20051005693400259
< 1 > ) , ( mailx , < 1 > , exit ) , ( < 1 > , exit , EOF * * * * ) )
(4) calculate shell-command sequence flows S iIn each shell-command sequence at S iIn the frequency of occurrences, wherein i is the natural number on interval [1, W]; Be sequence of calculation stream S iIn each shell sequence at S iIn occurrence number divided by each sequence occurrence number sum in this sequence flows.
Illustrate computational process below.For example, the sequence flows S shown in front 1=(( *SOF *), (cd), (<1 〉), (xquake), (﹠amp; ), (fg), (vi), (<1 〉), (mailx), (<1 〉), (exit), ( *EOF *)) in, the frequency of occurrences of sequence (<1 〉) is 1/4, the frequency of occurrences of sequence (cd) is 1/12.And at sequence flows S 2=(( *SOF *, cd), (cd,<1 〉), (<1 〉, xquake), (xquake , ﹠amp; ), (﹠amp; , fg), (fg, vi), (vi,<1 〉), (<1 〉, mailx), (mailx,<1 〉), (<1 〉, exit), (exit, *EOF *)) in, sequence (<1 〉, frequency of occurrences xquake) is 1/11 (frequency of occurrences of other sequence also is 1/11).
(5) read frequency threshold parameter in the control module: the frequency threshold η that is used to set up W shell-command sequence library 1, η 2..., η Wη wherein iBe employed frequency threshold when to set up sequence length be the shell-command sequence library of l (i), i is the natural number in interval [1, W].
(6) according to the frequency of occurrences of sequence, from W shell-command sequence flows S 1, S 2..., S WIn extract several shell-command sequences respectively as sample, set up W sequence library; Its concrete grammar step is:
If W is used to represent the set L={L (1) of the sequence library of this user's normal behaviour profile, L (2) ..., L (W) }, wherein L (i) expression is the sequence library that the sequence of l (i) is formed by length;
According to the order of natural number i from 1 to W, respectively with shell-command sequence flows S iThe middle frequency of occurrences is more than or equal to frequency threshold η iThe shell-command sequence extract as sample, promptly be considered as the shell-command sequence of the normal behaviour pattern of this validated user, and these arrangement sets constituted sequence library L (i) together.
Of particular note: study module when if new training data has been arranged, can recomputate the frequency of occurrences of each sequence automatically, and then sequence library is adjusted in the original training data learning process of carrying out to input according to new data; Promptly this system can adapt to the variation of validated user normal behaviour automatically.
(3) detect user behavior and export testing result: the normal behaviour profile that utilizes the described validated user that study module sets up, current behavior to this user is monitored in real time: if this user's current behavior departs from its historical normal behaviour profile largely, promptly think and take place to carry out relevant treatment unusually; This is to carry out the result that account number that unauthorized operation or outside invasion person falsely use this user is carried out illegal operation by this user unusually.Testing is mainly finished by detection module.
Referring to Fig. 6, the real-time detection step of introducing detection module is as follows:
(1) obtains with pretreatment module from data and obtain Audit data in real time; Promptly when detecting, data obtain that will to obtain the shell-command that this monitored validated user carries out in the monitored time from the shell history file in real time capable with pretreatment module, and after these command-line datas are carried out preliminary treatment, be transformed to a shell-command stream: R={ s 1, s 2..., s r, s wherein jJ the shell-command symbol that expression is arranged in chronological order, r is the length of this command stream; Data are obtained with pretreatment module according to time sequencing in real time with R={ s 1, s 2..., s rIn each shell-command symbol export detection module successively to.
(2) detection module utilizes the sequences match method to excavate " behavior pattern sequence " among the shell-command stream R, and according to the length computation of each " behavior pattern sequence " it and sequence library set L={L (1), L (2), ..., L (W) } similarity, " the behavior pattern sequence " of being arranged in chronological order stream P = ( Se q 1 * , Se q 2 * , . . . . . . , Se q M * ) , And corresponding similarity stream Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * , L ) , . . . . . . , Sim ( Se q M * , L ) ) , Seq wherein n *The n that expression is excavated from R " behavior pattern sequence ", Sim (Seq n *, L) expression Seq n *With the similarity of sequence library set, M excavates the number of " behavior pattern sequence " from R, and int (r/l (W))≤M≤r-l (W)+1.
Wherein " behavior pattern sequence " excavation and similarity Calculation Method concrete steps are described below:
Step 1, three variable: j:=1, i:=W, n:=1 are set;
If step 2 j≤r-l (W)+1 is with Seq j iCompare with sequence library L (i), execution in step 3 again; If j>r-l (W)+1, shut-down operation (" behavior pattern sequence " excavates and similarity computational process finishes);
If step 3 S &OverBar; e q j i &Element; L ( i ) (be Seq j iIdentical with certain sequence among the L (i)), then n " behavior pattern sequence " Se q n * : = S &OverBar; e q j i , Seq n *Similarity with sequence library set L Sim ( Se q n * , L ) : = 2 l ( i ) / 2 l ( W ) , J:=j+l (i), i:=W, n:=n+1, and return execution in step 2; If S &OverBar; e q j i &NotElement; L ( i ) (be Seq j iAll inequality with any sequence among the L (i)), i:=i-1 then, execution in step 4 then;
If execution in step 2 is returned in step 4 i ≠ 0; If i=0, then Se q n * : = ( s j ) , Sim ( Se q n * , L ) : = 0 , J:=j+1, i:=W, n:=n+1, and return execution in step 2.
Above " behavior pattern sequence " excavated and similarity calculating method can be understood as: with first shell-command (symbol) is starting point, form W length and be respectively l (W), l (W-1), ..., the sequence of l (1), and successively these sequences and corresponding sequence library are compared (coupling) according to length order from big to small, if one of them sequence is identical with certain sequence in the corresponding sequence storehouse, think that then this sequence is a normal behaviour pattern of this validated user, with this sequence definition is " behavior pattern sequence ", and calculate the similarity of this sequence and sequence library set according to sequence length, sequence length is long more, and the value of similarity is also big more.If the sequence in any one sequence and the corresponding sequence storehouse is all inequality, then current shell-command (symbol) is defined as length and is 1 " behavior pattern sequence ", and the similarity value of this sequence correspondence is composed is 0, then, be that starting point is formed W the sequence that length is different with the next shell-command (symbol) after this sequence again, proceed sequence comparison and similarity calculating according to above method, till r-l (W)+1 shell-command (symbol).Whether the current command sequence of the monitored user of this method major concern can mate fully with historical certain the normal behaviour pattern (normal sequence) of this user.In above-mentioned steps 3, formula Sim ( Se q n * , L ) : = 2 l ( i ) / 2 l ( W ) Expression is " behavior pattern sequence " Seq of l (i) with length n *Similarity Sim (Seq with sequence library set L n *, L) composing is 2 L (i)/ 2 L (W)Here, " behavior pattern sequence " corresponding similarity is the increasing function about its length, and the maximum of this function is 1.For example,, C={l (1), l (2), l (3) at W=3 }={ 1,2, under the situation of 3}, length is that three kinds " behavior pattern sequence " corresponding similarity of 1,2,3 is respectively 2/8,4/8,8/8.
Carry out " behavior pattern sequence " according to above method and excavate and similarity calculating " the behavior pattern sequence " that can be arranged in chronological order stream P = ( Se q 1 * , Se q 2 * , . . . . . . , Se q M * ) , And corresponding similarity stream Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * , L ) , . . . . . . , Sim ( Se q M * , L ) ) , Wherein M excavates the number of " behavior pattern sequence " from R, int (r/l (W))≤M≤r-l (W)+1.(int represents rounding operation.)
The length that it may be noted that " the behavior pattern sequence " excavated from R is unfixed.The length of these " behavior pattern sequences " can be for 1, l (1), l (2) ..., l (W-1) or l (W).When the length of " the behavior pattern sequence " excavated from R all is l (W) (in this case validated user in the monitored time behavior and its history on the degree of agreement of normal behaviour profile best), M gets minimum value int (r/l (W)); When the length of " the behavior pattern sequence " excavated from R all is 1 (in this case the behavior of this user in the monitored time with its history on the degree of agreement of normal behaviour profile the poorest), M gets maximum r-l (W)+1.
(3) similarity is flowed Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * , L ) , . . . . . . , Sim ( Se q M * , L ) ) Carry out windowing and get average, with the similarity average that obtains, promptly similarity decision value and decision threshold compare, and then the behavior of this monitored validated user is entered a judgement.At this moment, two kinds of selectable decision methods are arranged.
Wherein first kind of decision method is with fixing window length similarity stream to be carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this user's behavior is adjudicated again; Comprise following operating procedure:
(101) at first read the parameter that is provided with in the control module: window length w and decision threshold λ, when from Audit data R, excavating n behavior pattern sequence Seq n *, and calculate Sim (Seq n *, L) after, n 〉=w wherein; With Sim (Seq n *, L), calculate with Sim (Seq for terminal point carries out windowing to similarity stream Z n *, L) be w similarity of terminal point, i.e. Sim (Seq N-w+1 *, L), Sim (Seq N-w+2 *, L) ..., Sim (Seq n *, average L) obtains Seq n *Corresponding similarity decision value D (n): D ( n ) = 1 w &Sigma; m = n - w + 1 n Sim ( Se q m * , L ) ;
(102) utilize decision value D (n) and decision threshold λ that this user " current behavior " adjudicated; If D (n)>λ is judged to normal behaviour with this user's " current behavior "; If D (n)≤λ is judged to abnormal behaviour with this user's " current behavior "; Wherein this user " current behavior " is with respect to Seq n *, be meant that this user carries out with Seq n *W " behavior pattern sequence ", i.e. Seq for terminal point N-w+1 *, Seq N-w+2 *..., Seq n *Among the D (n), the initial value of n is w, i.e. n 〉=w, and the growth step-length of n is 1, excavating w " behavior pattern sequence " afterwards, whenever excavates one " behavior pattern sequence " again, just can make once this user's behavior and adjudicating; When n<w, do not calculate D (n), do not adjudicate yet.
Wherein second kind of decision method is to adopt variable window length that similarity stream is carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this monitored user's behavior is adjudicated again; Comprise following operating procedure:
(201) read the parameter that is provided with in the control module, comprise: V window length w (1), w (2), ..., w (V), V judgement upper limit u (1), u (2), ..., u (V) and V judgement lower limit d (1), d (2), ..., d (V), wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1);
(202) when from R, excavating n " behavior pattern sequence " Seq n *, and calculate Sim (Seq n *, L) after, continue to calculate Seq n *Corresponding similarity decision value D (n), and this user " current behavior " entered a judgement; Specifically comprise following operating procedure:
Step 1, variable k:=1 is set;
Step 2, the same w of n (k) is compared: if n 〉=w (k), execution in step 3; If n<w (k) does not then calculate D (n), this user " current behavior " do not adjudicated yet, finish this operation;
Step 3: calculate the similarity average N ( n , k ) = 1 w ( k ) &Sigma; m = n - w ( k ) + 1 n Sim ( Se q m * , L ) , This numerical value is that similarity is flowed Z = ( Sim ( Se q 1 * , L ) , Sim ( Se q 2 * , L ) , . . . . . . , Sim ( Se q M * , L ) ) In with Sim (Seq n *, L) be the individual similarity Sim (Seq of w (k) of terminal point N-w (k)+1 *, L), Sim (Seq N-w (k)+2 *, L) ..., Sim (Seq n *, L) carry out windowing, and get the similarity average that obtains after the average.Here, w (k) is a window length, and w (k)≤n≤M.
Step 4, judge whether to satisfy judgment condition: N (n, k)>u (k), if satisfy this condition, then Seq n *Corresponding similarity decision value is defined as D (n) :=N, and (n, k), and this user's " current behavior " is judged to normal behaviour, and (here, this user " current behavior " is with respect to Seq n *, it be meant that this user carries out with Seq n *Be the w (k) of terminal point individual " behavior pattern sequence ", i.e. Seq N-w (k)+1 *, Seq N-w (k)+2 *..., Seq n *).So far, finish decision operation, no longer carry out subsequent step user's " current behavior "; If (n k)>u (k), continues execution in step 5 not satisfy judgment condition N.
Step 5, judge whether to satisfy judgment condition: N (n, k)≤d (k), if satisfy this condition, then (n k), and is judged to abnormal behaviour with this user's " current behavior " to D (n) :=N, end is no longer carried out subsequent step to the judgement of user's " current behavior "; If (n k)≤d (k), continues execution in step 6 not satisfy judgment condition N.
Step 6, k:=k+1, promptly the value of k adds 1, and returns execution in step 2, and subsequent operation is carried out in circulation.
In above-mentioned second kind of decision method, calculating similarity decision value and the method that monitored user's current behavior is adjudicated can be understood as: n similarity Sim (Seq in calculating similarity stream n *, L) afterwards, at first window length is made as minimum value w (1), under the situation of n, with Sim (Seq more than or equal to window length n *L) for terminal point similarity stream is carried out windowing and get average (number of similarity equals window length in the window), then that this similarity average is corresponding with this window length judgement upper and lower bound compares, if satisfy judgment condition (promptly this similarity average is greater than the judgement upper limit of window length correspondence or smaller or equal to the judgement lower limit of window length correspondence), then this similarity average is defined as similarity decision value D (n), simultaneously monitored user's current behavior is entered a judgement, decision method is: if this similarity average (similarity decision value) is greater than the corresponding judgement upper limit, monitored user " current behavior " is judged to normal behaviour, if this similarity average then is judged to abnormal behaviour with it smaller or equal to corresponding judgement lower limit; If do not satisfy judgment condition (promptly this similarity average is greater than the judgement lower limit of window length correspondence and smaller or equal to the judgement upper limit of window length correspondence), then according to w (2), w (3) ... the precedence of w (V) increases window length, under the situation of n greater than window length, repeat above similarity windowing, get average and comparison procedure, till satisfying judgment condition, thereby obtain D (n); Simultaneously according to judgment condition monitored user " current behavior " entered a judgement (when running into n less than the situation of window length, D (n) is no longer calculated in then shut-down operation, also monitored user " current behavior " is not adjudicated).
According to above-mentioned second kind of decision method, when n<w (1), do not calculate similarity decision value D (n), this user " current behavior " do not adjudicated yet.When w (1)≤n<w (V), not necessarily can access D (n) and this user " current behavior " entered a judgement.When w (V)≤n≤M, always can obtain D (n), and can enter a judgement to this user " current behavior " and (annotate: u (V)=d (V)); At this moment, the growth step-length of n is 1 among the D (n), that is to say, whenever excavates one " behavior pattern sequence " and just can make once this user's behavior and adjudicating.And in actual applications, the number M of " the behavior pattern sequence " excavated from R is usually much larger than the longest window length w (V); So with respect to w (V)≤n≤M, n<w (1) and w (1)≤n<w (V) belongs to a few cases.
It may be noted that, detection module is when detecting, and three steps that monitored user is performed: what shell-command was capable obtains and preliminary treatment, the excavation and the calculation of similarity degree of " behavior pattern sequence ", to the windowing process of similarity stream, and the judgement of user behavior all carried out synchronously.In testing process, after monitored user executes several " behavior pattern sequences " (its number schoolmate length is relevant), whenever execute one " behavior pattern sequence " again, detection system of the present invention just can be excavated this " behavior pattern sequence ", and " behavior pattern sequence " corresponding similarity is somebody's turn to do in calculating, be that terminal point carries out windowing process (obtaining the corresponding similarity decision value of this " behavior pattern sequence ") to similarity stream with this similarity then, and then monitored user " current behavior " made once judgement.
In above detection step (3), the difference of two kinds of decision methods is that window length is different with decision method.In first kind of decision method, window length w is an important parameter, it has determined to occur to detection system is made judgement for the first time to its behavior time (being detection time) from monitored user behavior, the minimum length in time that equals w shell-command symbol the shortest detection time of this scheme (time that sequence compares and decision value calculates in not considering to detect).So w is more little for window length, the real-time of detection is just strong more.But in the practical application of system, along with reducing of window length w, accuracy in detection presents the trend of reduction.Second kind of decision method then taken into account detection time and accuracy in detection; This scheme adopts variable window length, compares with first kind of decision method, can improve the real-time that detects under the prerequisite that guarantees equal accuracy in detection, but the complexity of this scheme is higher relatively.
Need to prove: above-mentioned detection method is only carried out abnormality detection to the behavior of some (rather than the one group) validated user in the computer network system.In fact, detection system of the present invention and method can also be carried out abnormality detection to the behavior of one group of (a plurality of) validated user in the network system.Can adopt two kinds of ways in this case: (1) is if the authority of these validated users and behavioral characteristic differ bigger, can utilize the normal behaviour training data of each validated user to set up W sequence library respectively, utilize W sequence library separately that abnormality detection is carried out in each user's behavior more respectively; (2) if these validated users have same rights and privileges, and behavioral characteristic is more approaching, then these users' training data can be combined (the shell-command stream that is about to these users links together) and constitute total training data, utilize this training data to set up W sequence library, utilize this W sequence library that abnormality detection is carried out in each user's behavior again.
Introduce a test application example below and specify embodiment of the present invention.In this test application example, user behavior abnormality detection system of the present invention is configured on the server in certain corporate lan, be used for monitoring certain key procedure person's of this local area network (LAN) behavior, preventing that this programmer from carrying out unauthorized operation, and prevent that the account number that outside invasion person falsely uses this programmer from carrying out malicious operation.This test application example comprises that the operating state of system is two situations of learning state (physical training condition) and detected state.
Wherein the embodiment operating procedure of learning state is as follows:
(1) start-up system.
(2) safety officer of this local area network (LAN) is configured the operating state and the parameter of system: the operating state of system is made as learning state, the number W of shell-command sequence library is made as 5, sequence length l (1), l (2), l (3), l (4), l (5) are made as 1,2,3,4,5 respectively, frequency threshold η 1, η 2..., η WAll be made as 0.0002.After setting completed, the instruction of safety officer's input " starting working ", system accepts promptly to begin automatic operation after the instruction.
(3) control module is checked the situation that is provided with to system automatically, finds that the operating state of system is set as learning state, so system is switched to learning state.
(4) the control module driving data is obtained with pretreatment module original training data is written into from appointed positions.These data be on inherent this server of this programmer 8 months during normal running performed shell-command capable.
(5) data are obtained with pretreatment module original training data (shell-command is capable) are processed into the form that shell-command flows, and export it to study module.The shell-command that original training data is carried out obtaining after the preliminary treatment flows as follows: R=( *SOF *, cd,<1 〉, cd,<1〉..., vi,<1 〉, logout, *EOF *), comprise 9935 shell-command symbols altogether in this shell-command stream.
(6) the above shell-command stream of study module utilization is learnt, and sets up the shell-command sequence library, and deposits sequence library in the sequence memory module.The sequence length that study module is set up is that 1,2,3,4,5 sequence library L (1), L (2), L (3), L (4), L (5) are made up of 160,508,860,1015,1526 shell-command sequences respectively.
(7) study module is to the message of control module transmission " study finishes ", and control module makes system turn back to the state of waiting for instruction, i.e. state after the system start-up after receiving message.
(8) system closing.
In above embodiment, the core that system is learnt is to set up the shell-command sequence library according to training data, uses (retrieval relatively) for detection module when detecting.Set frequency threshold η in the step (2) 1, η 2..., η WBe important parameters very, they have determined the number of shell sequence among 5 sequence library L (1), L (2), L (3), L (4), the L (5).Frequency threshold is more little, and the sequence number in the sequence library is just many more, and system wants the data quantity stored will be big more, so frequency threshold can not be too little; But, if frequency threshold is excessive, can miss the shell-command sequence (behavior pattern) that some can reflect validated user operation rule when setting up sequence library, the feasible sequence library of being set up is the normal behaviour profile of representative of consumer well, thereby influences the detection accuracy rate of system; So, frequency threshold rationally be set be the key issue in the study.Below table 1 when having provided frequency threshold and being made as different numerical value, the sequence number in 5 sequence libraries.
Frequency threshold ??0.0001 ????0.0002 ????0.0003 ????0.0004 ????0.0005 ????0.0006
The sequence number of sequence library L (1) ??220 ????160 ????141 ????123 ????112 ????102
The sequence number of sequence library L (2) ??962 ????508 ????365 ????293 ????247 ????208
The sequence number of sequence library L (3) ??2155 ????860 ????559 ????398 ????320 ????250
The sequence number of sequence library L (4) ??3963 ????1015 ????893 ????604 ????542 ????461
The sequence number of sequence library L (5) ??6892 ????1526 ????1232 ????1067 ????915 ????796
Wherein the embodiment operating procedure of detected state is as follows:
(1) start-up system.
(2) safety officer of this local area network (LAN) is configured the operating state and the parameter of system: the operating state of system is made as detected state, decision method is made as second kind of decision method, the window length number V of this decision method is made as 3,3 window length are made as w (1)=30 respectively, w (2)=60, w (3)=90,3 judgement upper limits are made as u (1)=0.8 respectively, u (2)=0.7, u (3)=0.5,3 judgement lower limits are made as d (1)=0.3 respectively, d (2)=0.4, d (3)=0.5.After setting completed, the instruction of safety officer's input " starting working ", system accepts promptly to begin automatic operation after the instruction.
(3) control module is checked the situation that is provided with to system automatically, finds that the operating state of system is set as detected state, so system is switched to detected state.
(4) the control module driving data is obtained with pretreatment module original Audit data is written into from appointed positions; Simultaneously, data are obtained with pretreatment module original Audit data (shell-command is capable) are carried out preliminary treatment, and export pretreated Audit data (shell-command stream) to detection module.
(5) detection module obtains the Audit data machine that is provided with pretreatment module to data and analyzes the generation testing result.
(6) the testing result output module shows testing result (decision value curve), and abnormal behaviour is reported to the police.Fig. 7 is the decision value curve of output module output.Among the figure, the solid line of top is the decision value curve of system's testing result output module output when monitoring this programmer's normal behaviour.Be written into as original Audit data if the shell-command that the safety officer is carried out is capable, the decision value curve of testing result output module output is the dotted line of below among the figure; This dotted line can be considered the decision value curve that system exports when monitoring abnormal behaviour, because the operation that the safety officer is carried out in server mostly belongs to unauthorized operation (abnormal behaviour) to this programmer, in other words, the uncommitted a lot of operations carried out in safety officer's authority of this programmer.If going beyond one's commission, this programmer carried out the interior operation of safety officer's authority, the decision value of each point will smaller (as shown in phantom in FIG.) in the decision value curve of system output, when decision value during less than corresponding decision threshold, system just can detect these unauthorized operations (abnormal behaviour) and report to the police.

Claims (10)

1, a kind of user behavior abnormality detection system based on machine learning, this system configuration is on the server of needs monitoring, adopt user interface shell-command on the Unix platform as Audit data, detect in the server in the behavior of user interface layer analysis user and whether invade; It is characterized in that: described system includes:
Control module is responsible for the operating state and the various detected parameters of the system that is provided with, and data is obtained with the operation of pretreatment module, study module, detection module and whole system control;
Data are obtained and pretreatment module, be responsible for from server, obtaining original training data or Audit data, it is the shell-command line data that the user carries out, and after these original training datas or Audit data being processed into the form of shell-command stream, send into study module or detection module respectively, be used for study or detection;
Study module adopts machine learning techniques, obtains the knowledge of certain validated user normal behaviour in the network system from training data, and sets up the shell-command sequence library of the normal behaviour profile that is used to represent this validated user on its basis;
The sequence memory module is used to store the shell-command sequence library that study module is set up, and when detecting, retrieves comparison for detection module;
Detection module, the shell-command Audit data of being responsible for described validated user was carried out in the monitored time carries out analyzing and processing, finishes excavation, similarity calculating or the assignment that includes but not limited to " behavior pattern sequence ", the work that the windowing filter is made an uproar, decision value calculates and user behavior is adjudicated of similarity;
The testing result output module is responsible for showing the decision value curve of detection module generation, and under the control of detection module abnormal behaviour is reported to the police.
2, a kind of detection method of user behavior abnormality detection system of claim 1, it is characterized in that: the detection method of described system comprises following operating procedure:
(1) system start-up;
(2) during the input of system wait instruction, the operating state and the running parameter of system are set by control module, so that after input " starting working " instruction after this, automatically check the situation that is provided with of system by control module, enter two kinds of different operating states respectively:, carry out subsequent operation if system is set to learning state; If system is set to detected state, then redirect execution in step (6);
(3) under control module drove, data were obtained with pretreatment module and are written into original training data from predefined data-interface, and this original training data is carried out preliminary treatment, made it become the form of shell-command stream, exported it to study module again;
(4) the study mould certainly utilizes pretreated shell-command stream training data to learn, and sets up the shell-command sequence library, and after depositing this sequence library in the sequence memory module, sends the message of " study finishes " to control module;
(5) after control module receives study module " study finishes " message, make system return step (2), wait for and import the new instruction that is provided with; Perhaps direct operating state with system transfers detected state to, carries out subsequent operation;
(6) under control module drives, data are obtained with pretreatment module and are written into the capable original Audit data of shell-command in real time from predefined data-interface, simultaneously this original Audit data is carried out real-time preliminary treatment, and export pretreated shell-command stream Audit data to detection module in real time;
(7) detection module carries out real-time analysis to this Audit data, generates testing result;
(8) the testing result output module shows testing result: the decision value curve, and abnormal behaviour carried out Realtime Alerts.
3, the detection method of user behavior abnormality detection system according to claim 2 is characterized in that: in the described step (2), if the operating state of system is set to learning state, the running parameter that needs to be provided with comprises:
Be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
If the operating state of system is set to detected state, the running parameter that needs to be provided with has two kinds, is respectively:
First kind of decision method, window length w, decision threshold λ; Or
Second kind of decision method, the number V of window length, V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V); Wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1).
4, the detection method of user behavior abnormality detection system according to claim 2 is characterized in that: data are obtained with pretreatment module the preliminary treatment that original training data or original Audit data carry out are comprised the following steps: in described step (3) or the step (6)
31, extract title names, mark flags and the metacharacter metacharacters that shell-command is ordered in capable;
32, will include but not limited to that the information of filename, server name, catalogue, network address replaces with the identifier<n of consolidation form 〉, wherein n represents the number of filename, server name, catalogue or network address;
33, on the time point that each shell session begins and finishes, insert the identifier SOF and the EOF of expression starting and ending respectively;
34, the shell-command symbol that will comprise the message identification symbol of the symbol of title, mark and metacharacter of order and filename, server name, catalogue, network address is arranged according to the appearance order in the shell session; And connect the order symbol of different shell sessions according to time sequencing, and in above-mentioned data, do not add timestamp, through after this preliminary treatment, original input data becomes shell-command stream in form: a string shell-command symbol of arranging in chronological order, and shell-command stream can comprise the content of a plurality of shell sessions.
5, the detection method of user behavior abnormality detection system according to claim 2 is characterized in that: the learning manipulation that study module utilizes pretreated shell-command stream training data to carry out in the described step (4) comprises the following steps:
41, obtain the training data that obtains passing through pretreated, as to represent this validated user normal behaviour with pretreatment module from data: R=(s 1, s 2..., s r), i.e. length be r shell-command stream, wherein s jJ the shell-command symbol that expression is arranged in chronological order;
42, from control module, read learning parameter: be used to represent the normal behaviour profile of this validated user and the number W of the shell-command sequence library that needs are set up, and W sequence length l (1), l (2) ..., l (W); Wherein l (i) is the length of sequence in i the shell-command sequence library, and l (1)<l (2)<...<l (W);
43, by shell-command stream R generate W sequence length be respectively l (1), l (2) ..., the shell-command sequence flows of l (W): S 1, S 2..., S W, S wherein iBe that sequence length is the shell-command sequence flows of l (i): S i = ( Seq 1 i , Seq 2 i , . . . . . . , Seq r - l ( i ) + 1 i ) , In the formula Seq j i = ( s j , s j + 1 , . . . . . . , s j + l ( i ) - 1 ) , It is S iIn j the shell-command sequence of arranging in chronological order, i is the natural number in the interval [1, W];
44, calculate shell-command sequence flows S jIn each shell-command sequence at S iIn the frequency of occurrences, wherein i is the natural number in interval [1, W]; Be sequence of calculation stream S jIn each shell sequence at S iIn occurrence number divided by each sequence occurrence number sum in this sequence flows;
45, read frequency threshold parameter in the control module: the frequency threshold η that is used to set up W shell-command sequence library 1, η 2..., η Wη wherein iBe employed frequency threshold when to set up sequence length be the shell-command sequence library of l (i), i is the natural number in interval [1, W];
46, according to the frequency of occurrences of sequence, from W shell-command sequence flows S 1, S 2..., S WIn extract several shell-command sequences respectively as sample, set up W sequence library; Its concrete grammar step is:
If W is used to represent the set L={L (1) of the sequence library of this user's normal behaviour profile, L (2) ..., L (W) }, wherein L (i) expression is the sequence library that the sequence of l (i) is formed by length;
According to the order of natural number i from 1 to W, respectively with shell-command sequence flows S iThe middle frequency of occurrences is more than or equal to frequency threshold η lThe shell-command sequence extract as sample, promptly be considered as the shell-command sequence of the normal behaviour pattern of this validated user, and these arrangement sets constituted sequence library L (i) together.
6, the detection method of user behavior abnormality detection system according to claim 2 is characterized in that: the real-time detecting operation that detection module carries out real-time analysis to Audit data and generates testing result in the described step (7) comprises the following steps:
71, obtain with pretreatment module from data and obtain Audit data in real time; Promptly when detecting, data obtain that will to obtain the shell-command that this monitored validated user carries out in the monitored time from the shell history file in real time capable with pretreatment module, and after these command-line datas are carried out preliminary treatment, be transformed to a shell-command stream: R={ s 1, s 2..., s r, s wherein jJ the shell-command symbol that expression is arranged in chronological order, r is the length of this command stream; Data are obtained with pretreatment module according to time sequencing in real time with R={ s 1, s 2..., s rIn each shell-command symbol export detection module successively to;
72, detection module utilizes the sequences match method to excavate " behavior pattern sequence " among the shell-command stream R, and according to the length computation of each " behavior pattern sequence " it and sequence library set L={L (1), L (2), ..., L (W) } similarity, " the behavior pattern sequence " of being arranged in chronological order stream P = ( Seq 1 * , Seq 2 * , . . . . . . , Seq M * ) , And corresponding similarity stream Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) , Seq wherein n *The n that expression is excavated from R " behavior pattern sequence ", Sim (Seq n *, L) expression Seq n *With the similarity of sequence library set, M excavates the number of " behavior pattern sequence " from R, and int (r/l (W))≤M≤r-l (W)+1;
73, similarity is flowed Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) Carry out windowing and get average, with the similarity average that obtains, promptly similarity decision value and decision threshold compare, and then the behavior of this monitored validated user is entered a judgement;
When detecting in real time, described three steps 71, the 72, the 73rd are carried out synchronously.
7, the detection method of user behavior abnormality detection system according to claim 6 is characterized in that: detection module utilizes the sequences match method to excavate the calculation of similarity degree that behavior pattern sequence and each behavior pattern sequence and sequence library are gathered among described user's shell-command stream Audit data R further to comprise the following steps: in the described step 72
721, three variable: j:=1, i:=W, n:=1 are set;
If 722 j≤r-l (W)+1 are with Seq j iCompare with sequence library L (i), execution in step 723 again; If j>r-l (W)+1, shut-down operation promptly finishes the excavation and the similarity of behavior mode sequences and calculates;
If 723 S &OverBar; eq j i &Element; L ( i ) , Be Seq j iIdentical with certain sequence among the sequence library L (i), n behavior pattern sequence then Seq n * : = S &OverBar; eq j i , Seq n * With the similarity of sequence library set L be: Sim ( Seq n * , L ) : = 2 l ( i ) / 2 l ( W ) , J:=j+l (i), i:=W, n:=n+1, and return execution in step 722; If S &OverBar; eq j i &NotElement; L ( i ) , Be Seq j iAll inequality with any sequence among the sequence library L (i), i:=i-1 then, execution in step 724 then;
If execution in step 722 is returned in 724 i ≠ 0; If i=0, then Seq n * : = ( s j ) , Sim ( Seq n * , L ) : = 0 , J:=j+1, i:=W, n:=n+1, and return execution in step 722.
8, the detection method of user behavior abnormality detection system according to claim 6, it is characterized in that: detection module has two kinds of decision methods to select for the user when similarity stream being carried out windowing process and this user's behavior adjudicated in the described step 73;
Wherein first kind of decision method is with fixing window length similarity stream to be carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this user's behavior is adjudicated again; Comprise following operating procedure:
7301, read the parameter that is provided with in the control module: window length w and decision threshold λ, when from Audit data R, excavating n behavior pattern sequence Seq n *, and calculate Sim (Seq n *, L) after, n 〉=w wherein; With Sim (Seq n *, L), calculate with Sim (Seq for terminal point carries out windowing to similarity stream Z n *, L) be w similarity of terminal point, i.e. Sim (Seq N-w+1 *, L), Sim (Seq N-w+2 *, L) ..., Sim (Seq n *, average L) obtains Seq n *Corresponding similarity decision value D (n): D ( n ) = 1 w &Sigma; m = n - w + 1 n Sim ( Seq m * , L ) ;
7302, utilize decision value D (n) and decision threshold λ that this user " current behavior " adjudicated; If D (n)>λ is judged to normal behaviour with this user's " current behavior "; If D (n)≤λ is judged to abnormal behaviour with this user's " current behavior ";
Wherein second kind of decision method is to adopt variable window length that similarity stream is carried out windowing and gets average, the similarity average that obtains, and promptly the similarity decision value utilizes similarity decision value and decision threshold that this monitored user's behavior is adjudicated again; Comprise following operating procedure:
7311, read the parameter that is provided with in the control module: V window length w (1), w (2) ..., w (V), V judgement upper limit u (1), u (2) ..., u (V) and V adjudicate lower limit d (1), d (2) ..., d (V), wherein, w (1)<w (2)<...<w (V), u (k) and d (k) are respectively the pairing judgement upper limit of k window length c (k) and judgement lower limit, k is interval for [1, V] natural number, and u (1)>u (2)>...>u (V-1)>u (V)=d (V)>d (V-1)>...>d (2)>d (1);
7312, when from R, excavating n " behavior pattern sequence " Seq n *, and calculate Sim (Seq n *, L) after, continue to calculate Seq n *Corresponding similarity decision value D (n), and this user " current behavior " entered a judgement.
9, the detection method of user behavior abnormality detection system according to claim 8 is characterized in that: calculating and decision method that described step 7312 is carried out further comprise following operating procedure:
Step 1, variable k:=1 is set;
Step 2, the same w of n (k) is compared: if n 〉=w (k) carries out subsequent step; If n<w (k) does not then calculate D (n), this user " current behavior " do not adjudicated yet, finish this operation;
Step 3: (n, k), this numerical value is that similarity is flowed to calculate similarity average N Z = ( Sim ( Seq 1 * , L ) , Sim ( Seq 2 * , L ) , . . . . . . , Sim ( Seq M * , L ) ) In with Sim (Seq n *, L) be the individual similarity Sim (Seq of w (k) of terminal point N-w (k)+1 *, L), Sim (Seq N-w (k)+2 *, L) ..., Sim (Seq n *, L) carry out windowing and get average after obtain: N ( n , k ) = 1 w ( k ) &Sigma; m = n - w ( k ) + 1 n Sim ( Seq m * , L ) , W in the formula (k) is a window length, and w (k)≤n≤M;
Step 4, judge whether to satisfy judgment condition: N (n, k)>u (k), if satisfy this condition, then Seq n *Corresponding similarity decision value is defined as D (n) :=N, and (n k), and is judged to normal behaviour with this user's " current behavior "; If (n k)>u (k), continues to carry out subsequent operation not satisfy judgment condition N;
Step 5, judge whether to satisfy judgment condition: N (n, k)≤d (k), if satisfy this condition, then (n k), and is judged to abnormal behaviour with this user's " current behavior " to D (n) :=N, finishes the judgement to user's " current behavior "; If (n k)≤d (k), continues to carry out subsequent operation not satisfy judgment condition N.
Step 6, k:=k+1, promptly the value of k adds 1, and returns execution in step 2, and subsequent operation is carried out in circulation.
10, the detection method of user behavior abnormality detection system according to claim 2, it is characterized in that: described detection method is used for abnormality detection is carried out in the behavior of some validated users of computer network system, perhaps abnormality detection is carried out in the behavior of a group or a plurality of validated users in the network system, for the latter, can adopt two kinds of diverse ways:
If the authority and the behavioral characteristic of one group or a plurality of validated users differ bigger, then utilize the normal behaviour training data of each validated user to set up W sequence library respectively, utilize W sequence library separately that abnormality detection is carried out in each user's behavior more respectively;
If one group or a plurality of validated user have same rights and privileges, and behavioral characteristic is more approaching, then these users' training data is combined, the shell-command stream that is about to these users links together and constitutes total training data, utilize this training data to set up W sequence library, utilize this W sequence library that abnormality detection is carried out in each user's behavior again.
CNB2005100569348A 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study Active CN1333552C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100569348A CN1333552C (en) 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100569348A CN1333552C (en) 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study

Publications (2)

Publication Number Publication Date
CN1649311A true CN1649311A (en) 2005-08-03
CN1333552C CN1333552C (en) 2007-08-22

Family

ID=34876795

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100569348A Active CN1333552C (en) 2005-03-23 2005-03-23 Detecting system and method for user behaviour abnormal based on machine study

Country Status (1)

Country Link
CN (1) CN1333552C (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008131667A1 (en) * 2007-04-28 2008-11-06 Huawei Technologies Co., Ltd. Method, device for identifying service flows and method, system for protecting against a denial of service attack
WO2010063878A1 (en) * 2008-12-04 2010-06-10 Nokia Corporation Methods, apparatuses, and computer program products in social services
CN102402517A (en) * 2010-09-09 2012-04-04 北京启明星辰信息技术股份有限公司 Method and system for establishing normal database login model and method and system for detecting abnormal login behavior
CN102413127A (en) * 2011-11-09 2012-04-11 中国电力科学研究院 Database generalization safety protection method
CN102541899A (en) * 2010-12-23 2012-07-04 阿里巴巴集团控股有限公司 Information identification method and equipment
CN101702720B (en) * 2009-10-28 2012-09-05 中国科学院计算技术研究所 Model training method and detecting method in detection of impersonation attack
CN103064870A (en) * 2012-09-24 2013-04-24 深信服网络科技(深圳)有限公司 Web anti-injection method, device and equipment
CN101572691B (en) * 2008-04-30 2013-10-02 华为技术有限公司 Method, system and device for intrusion detection
CN101803323B (en) * 2007-02-26 2013-10-30 艾利森电话股份有限公司 Method and apparatus for monitoring client behaviour
CN103581355A (en) * 2012-08-02 2014-02-12 北京千橡网景科技发展有限公司 Method and device for handling abnormal behaviors of user
CN101902366B (en) * 2009-05-27 2014-03-12 北京启明星辰信息技术股份有限公司 Method and system for detecting abnormal service behaviors
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
CN103997621A (en) * 2013-02-20 2014-08-20 霍尼韦尔国际公司 System and method of monitoring the video surveillance activities
CN104883346A (en) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 Network equipment behavior analysis method and system
CN104935600A (en) * 2015-06-19 2015-09-23 中国电子科技集团公司第五十四研究所 Mobile ad hoc network intrusion detection method and device based on deep learning
CN105683944A (en) * 2013-11-04 2016-06-15 谷歌公司 Systems and methods for layered training in machine-learning architectures
CN105959180A (en) * 2016-06-12 2016-09-21 乐视控股(北京)有限公司 Data detection method and device
CN106230849A (en) * 2016-08-22 2016-12-14 中国科学院信息工程研究所 A kind of smart machine machine learning safety monitoring system based on user behavior
CN106561026A (en) * 2016-07-29 2017-04-12 北京安天电子设备有限公司 Method and system for diagnosing invasion based on user account operation behavior
CN106789885A (en) * 2016-11-17 2017-05-31 国家电网公司 User's unusual checking analysis method under a kind of big data environment
CN106953766A (en) * 2017-03-31 2017-07-14 北京奇艺世纪科技有限公司 A kind of alarm method and device
CN108156146A (en) * 2017-12-19 2018-06-12 北京盖娅互娱网络科技股份有限公司 A kind of method and apparatus for being used to identify abnormal user operation
CN108234480A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 Intrusion detection method and device
CN108399700A (en) * 2018-01-31 2018-08-14 上海乐愚智能科技有限公司 Theft preventing method and smart machine
CN108509793A (en) * 2018-04-08 2018-09-07 北京明朝万达科技股份有限公司 A kind of user's anomaly detection method and device based on User action log data
CN108667818A (en) * 2018-04-20 2018-10-16 北京元心科技有限公司 The method of cloud device and cloud net end Collaborative Control access rights
CN108769026A (en) * 2018-05-31 2018-11-06 康键信息技术(深圳)有限公司 User account detecting system and method
CN109246072A (en) * 2017-07-11 2019-01-18 波音公司 Network safety system with adaptive machine learning feature
CN109639659A (en) * 2018-12-05 2019-04-16 四川长虹电器股份有限公司 A kind of implementation method of the WEB application firewall based on machine learning
CN110519241A (en) * 2019-08-12 2019-11-29 广州海颐信息安全技术有限公司 The method and device for actively discovering privilege and threatening abnormal behaviour based on machine learning
CN110662476A (en) * 2017-05-25 2020-01-07 日本电气株式会社 Information processing apparatus, control method, and program
CN111310186A (en) * 2020-03-17 2020-06-19 优刻得科技股份有限公司 Method, device and system for detecting confusion command line
WO2020225819A1 (en) * 2019-05-07 2020-11-12 B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University Methods and devices for detecting improper clinical programming of implantable medical devices
CN113556338A (en) * 2021-07-20 2021-10-26 龙海 Computer network security abnormal operation interception method
CN113918139A (en) * 2014-09-24 2022-01-11 思睿人工智能公司 Identifying non-technical losses using machine learning
CN114036520A (en) * 2021-11-26 2022-02-11 安天科技集团股份有限公司 Application information forensics method and apparatus, electronic device, computer-readable storage medium, and program product

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424391B (en) * 2013-09-06 2017-11-28 联想(北京)有限公司 A kind of method and apparatus supervised automatically

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4649080B2 (en) * 2001-09-17 2011-03-09 株式会社東芝 Network intrusion detection system, apparatus and program
US20030083847A1 (en) * 2001-10-31 2003-05-01 Schertz Richard L. User interface for presenting data for an intrusion protection system
JP2004309998A (en) * 2003-02-18 2004-11-04 Nec Corp Probabilistic distribution estimation apparatus, abnormal behavior detection device, probabilistic distribution estimation method, and abnormal behavior detection method
JP2004312083A (en) * 2003-04-02 2004-11-04 Kddi Corp Learning data generating apparatus, intrusion detection system, and its program
CN1555156A (en) * 2003-12-25 2004-12-15 上海交通大学 Self adaptive invasion detecting method based on self tissue mapping network
CN1291569C (en) * 2004-09-24 2006-12-20 清华大学 Abnormal detection method for user access activity in attached net storage device

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101803323B (en) * 2007-02-26 2013-10-30 艾利森电话股份有限公司 Method and apparatus for monitoring client behaviour
WO2008131667A1 (en) * 2007-04-28 2008-11-06 Huawei Technologies Co., Ltd. Method, device for identifying service flows and method, system for protecting against a denial of service attack
CN101572691B (en) * 2008-04-30 2013-10-02 华为技术有限公司 Method, system and device for intrusion detection
WO2010063878A1 (en) * 2008-12-04 2010-06-10 Nokia Corporation Methods, apparatuses, and computer program products in social services
CN101902366B (en) * 2009-05-27 2014-03-12 北京启明星辰信息技术股份有限公司 Method and system for detecting abnormal service behaviors
CN101702720B (en) * 2009-10-28 2012-09-05 中国科学院计算技术研究所 Model training method and detecting method in detection of impersonation attack
CN102402517A (en) * 2010-09-09 2012-04-04 北京启明星辰信息技术股份有限公司 Method and system for establishing normal database login model and method and system for detecting abnormal login behavior
CN102541899A (en) * 2010-12-23 2012-07-04 阿里巴巴集团控股有限公司 Information identification method and equipment
CN102541899B (en) * 2010-12-23 2014-04-16 阿里巴巴集团控股有限公司 Information identification method and equipment
CN102413127A (en) * 2011-11-09 2012-04-11 中国电力科学研究院 Database generalization safety protection method
CN103581355A (en) * 2012-08-02 2014-02-12 北京千橡网景科技发展有限公司 Method and device for handling abnormal behaviors of user
CN103064870A (en) * 2012-09-24 2013-04-24 深信服网络科技(深圳)有限公司 Web anti-injection method, device and equipment
CN103064870B (en) * 2012-09-24 2016-05-11 深圳市深信服电子科技有限公司 Method, device and the equipment of the anti-injection of Web
CN103997621A (en) * 2013-02-20 2014-08-20 霍尼韦尔国际公司 System and method of monitoring the video surveillance activities
CN105683944A (en) * 2013-11-04 2016-06-15 谷歌公司 Systems and methods for layered training in machine-learning architectures
CN105683944B (en) * 2013-11-04 2019-08-09 谷歌有限责任公司 Method, equipment and medium for the order training method in machine learning framework
CN103793484B (en) * 2014-01-17 2017-03-15 五八同城信息技术有限公司 The fraud identifying system based on machine learning in classification information website
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
CN113918139A (en) * 2014-09-24 2022-01-11 思睿人工智能公司 Identifying non-technical losses using machine learning
CN104883346A (en) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 Network equipment behavior analysis method and system
CN104935600A (en) * 2015-06-19 2015-09-23 中国电子科技集团公司第五十四研究所 Mobile ad hoc network intrusion detection method and device based on deep learning
CN104935600B (en) * 2015-06-19 2019-03-22 中国电子科技集团公司第五十四研究所 A kind of mobile ad-hoc network intrusion detection method and equipment based on deep learning
CN105959180A (en) * 2016-06-12 2016-09-21 乐视控股(北京)有限公司 Data detection method and device
CN106561026A (en) * 2016-07-29 2017-04-12 北京安天电子设备有限公司 Method and system for diagnosing invasion based on user account operation behavior
CN106230849B (en) * 2016-08-22 2019-04-19 中国科学院信息工程研究所 A kind of smart machine machine learning safety monitoring system based on user behavior
CN106230849A (en) * 2016-08-22 2016-12-14 中国科学院信息工程研究所 A kind of smart machine machine learning safety monitoring system based on user behavior
CN106789885B (en) * 2016-11-17 2021-11-16 国家电网公司 User abnormal behavior detection and analysis method under big data environment
CN106789885A (en) * 2016-11-17 2017-05-31 国家电网公司 User's unusual checking analysis method under a kind of big data environment
CN106953766A (en) * 2017-03-31 2017-07-14 北京奇艺世纪科技有限公司 A kind of alarm method and device
US11553829B2 (en) 2017-05-25 2023-01-17 Nec Corporation Information processing apparatus, control method and program
CN110662476B (en) * 2017-05-25 2022-02-11 日本电气株式会社 Information processing apparatus, control method, and program
CN110662476A (en) * 2017-05-25 2020-01-07 日本电气株式会社 Information processing apparatus, control method, and program
CN109246072A (en) * 2017-07-11 2019-01-18 波音公司 Network safety system with adaptive machine learning feature
CN108156146B (en) * 2017-12-19 2021-07-30 北京盖娅互娱网络科技股份有限公司 Method and device for identifying abnormal user operation
CN108156146A (en) * 2017-12-19 2018-06-12 北京盖娅互娱网络科技股份有限公司 A kind of method and apparatus for being used to identify abnormal user operation
CN108234480A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 Intrusion detection method and device
CN108399700A (en) * 2018-01-31 2018-08-14 上海乐愚智能科技有限公司 Theft preventing method and smart machine
CN108509793A (en) * 2018-04-08 2018-09-07 北京明朝万达科技股份有限公司 A kind of user's anomaly detection method and device based on User action log data
CN108667818A (en) * 2018-04-20 2018-10-16 北京元心科技有限公司 The method of cloud device and cloud net end Collaborative Control access rights
CN108769026A (en) * 2018-05-31 2018-11-06 康键信息技术(深圳)有限公司 User account detecting system and method
CN109639659A (en) * 2018-12-05 2019-04-16 四川长虹电器股份有限公司 A kind of implementation method of the WEB application firewall based on machine learning
WO2020225819A1 (en) * 2019-05-07 2020-11-12 B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University Methods and devices for detecting improper clinical programming of implantable medical devices
CN110519241A (en) * 2019-08-12 2019-11-29 广州海颐信息安全技术有限公司 The method and device for actively discovering privilege and threatening abnormal behaviour based on machine learning
CN111310186A (en) * 2020-03-17 2020-06-19 优刻得科技股份有限公司 Method, device and system for detecting confusion command line
CN113556338A (en) * 2021-07-20 2021-10-26 龙海 Computer network security abnormal operation interception method
CN114036520A (en) * 2021-11-26 2022-02-11 安天科技集团股份有限公司 Application information forensics method and apparatus, electronic device, computer-readable storage medium, and program product

Also Published As

Publication number Publication date
CN1333552C (en) 2007-08-22

Similar Documents

Publication Publication Date Title
CN1649311A (en) Detecting system and method for user behaviour abnormal based on machine study
Liu et al. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise
US9715588B2 (en) Method of detecting a malware based on a white list
CN111753303B (en) Multi-granularity code vulnerability detection method based on deep learning and reinforcement learning
CN100483434C (en) Method and device for recognizing virus
Svacina et al. On vulnerability and security log analysis: A systematic literature review on recent trends
US8533841B2 (en) Deriving remediations from security compliance rules
CN109308415B (en) Binary-oriented guidance quality fuzzy test method and system
CN106156628B (en) User behavior analysis method and device
CN1649312A (en) Program grade invasion detecting system and method based on sequency mode evacuation
CN1975750A (en) Software operation modeling and monitoring device and method
CN1694454A (en) Active network safety loophole detector
CN114077741A (en) Software supply chain safety detection method and device, electronic equipment and storage medium
CN109857648B (en) API misuse change pattern mining method
Tsuchiya et al. Recovering traceability links between requirements and source code using the configuration management log
Yuan et al. Towards {Large-Scale} Hunting for Android {Negative-Day} Malware
Pandey et al. A framework for producing effective and efficient secure code through malware analysis
Scaffidi et al. Predicting reuse of end-user web macro scripts
CN114546836A (en) Public component library automatic testing method and device based on push-down automaton guidance
Siregar et al. Enhancing Network Anomaly Detection with Optimized One-Class SVM (OCSVM)
Busch et al. SecEval: an evaluation framework for engineering secure systems
CN1863173A (en) Implementing method and apparatus of obtaining equipment output information
Suganya et al. Auditing of hadoop log file for dynamic detection of threats using H-ISSM-MIM and convolutional neural network
Liu et al. Generating sound workflow views for correct provenance analysis
CN113778733B (en) Log sequence anomaly detection method based on multi-scale MASS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: BEIJING CAPITEK CO, LTD.

Free format text: FORMER NAME: BEIJING SHOUXIN SCIENCE AND TECHNOLOGY CO., LTD.

CP03 Change of name, title or address

Address after: 100015 Beijing City, Chaoyang District Road No. 5

Patentee after: Beijing Capitek Co, Ltd.

Address before: 100016 Beijing city Chaoyang District Dongzhimen Road No. 5

Patentee before: Beijing Shouxin Science and Technology Co., Ltd.