CN115022052B

CN115022052B - Internal user abnormal behavior fusion detection method and system based on user binary analysis

Info

Publication number: CN115022052B
Application number: CN202210637361.1A
Authority: CN
Inventors: 杨光; 付勇; 赵大伟; 王继志; 吴钰; 陈丽娟
Original assignee: Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2023-05-30
Anticipated expiration: 2042-06-07
Also published as: CN115022052A

Abstract

The invention relates to a fusion detection method and a system for abnormal behaviors of an internal user based on user binary analysis, wherein the method comprises the following steps: collecting user binary data; based on the user element characteristics and the user behavior characteristics, respectively and independently training a user element characteristic classifier model and a user abnormal behavior classifier model, and respectively carrying out user element characteristic abnormal individual detection and user abnormal behavior detection; based on the detection result, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions: for users with abnormal meta-characteristics and abnormal behavior characteristics, directly alarming; for the users with normal meta-characteristics and normal behavior characteristics, judging the users to be normal; for users with abnormal meta-characteristics and normal behavior characteristics, properly regulating and controlling a behavior deviation threshold; for users with normal meta-characteristics and abnormal behavior characteristics, the behavior deviation threshold value is regulated and controlled appropriately. The invention can analyze the combination situation of the judgment result of the binary classifier of the user without omission, and execute the subsequent analysis and detection in a targeted way.

Description

Internal user abnormal behavior fusion detection method and system based on user binary analysis

Technical Field

The invention relates to a fusion detection method and system for abnormal behaviors of internal users based on user binary analysis, belonging to the technical field of information security construction/network security.

Background

At present, security of network information is getting more and more attention to society, although security products such as antivirus software, firewalls, intrusion detection and the like are widely applied, since secret leakage and intrusion threats caused by internal personnel are increasingly serious, since internal threat attackers are generally employees (incumbent or departures), contractors, business partners and the like of enterprises or organizations, and have access rights to organized systems, networks and data, the internal threats are generally extremely high in concealment and hazard, so that traditional deep defense systems based on security devices such as firewalls, IDSs and the like cannot effectively cope with the internal threats, and therefore, there is a great need to design an internal threat detection system with strong practicability.

Existing internal threat detection techniques are mainly focused on three areas: firstly, the abnormal behavior detection (i.e. objective detection) of the user assumes that the attack behavior is definitely different from the normal behavior, so that the abnormal detection analysis is performed on the objective behavior data by mainly constructing the operation behavior characteristics (such as command sequence, file access, web browsing and the like) of the user on the information system, namely modeling the normal behavior of the user, and identifying the behavior with larger deviation from the normal model as an abnormal alarm. The above assumption is applicable to most internal attack scenarios, however, the abnormal behavior is obviously not equivalent to the malicious behavior by the essential analysis, so if the abnormal behavior detection method is simply relied on, unavoidable misjudgment occurs in the actual internal threat detection, for example, the accidental abnormal login operation of the user triggers an abnormal detection alarm, but the abnormal detection alarm is likely to be represented as a false alarm. Second, user social psychological factor detection (i.e., subjective detection) is increasingly attracting attention as a viable solution to remedy the above-mentioned deficiencies. One representative method is to model an attack motivation from the perspective of the personality characteristics of the user, and the core is to extract keyword frequencies from the text data of the user and calculate the corresponding personality characteristic scores in a mapping way; the other method is based on emotion analysis in social application data of users, and the anxiety and stress level of the users are characterized by analyzing interest transfer and comment attitude trends of the users, for example, the users frequently browse entertainment sports theme news before and more recently browse new crown epidemic situation related news, so that the intrinsic anxiety emotion of the users is reflected. Thirdly, the subjective and objective fusion detection of the user is gradually attracting attention as a new solution for remedying the defects of the two detection technologies, however, because the association between the psychological and behavioral characteristics of the user is hidden and complicated, the user is difficult to directly express by a clear mathematical function relationship, so the prior art only stops at the theoretical model level, and a practical and feasible fusion technical scheme is not designed yet.

The current internal threat fusion detection field mainly faces the key problems and challenges of staying only at the theoretical model level and lacking a mature and feasible fusion analysis detection technical scheme. The main reasons are as follows: first, the design threshold of the fusion technical scheme is high: (1) interdisciplinary limitations: the attack motivation research requires more specialized knowledge in the fields of psychology and sociology, and provides higher interdisciplinary background requirements for researchers, and finally, the research development of fusion detection is hindered by the difference between disciplines; (2) the misjudgment rate is high, for example, an individual in a state of anxiety for a long period is likely to be only in bad habit such as alcoholism, and therefore, if a risk user is judged based on the misjudgment rate, a high misjudgment is inevitably caused; the complex hidden influence mechanism exists between the quantitative attack motivation strength and the triggering attack behavior, and whether the user can implement the attack behavior is difficult to determine by directly defining the attack motivation strength value; (3) the fused detection core evaluation index is the recall rate of the positive class and the false alarm rate of the negative class, and for abnormal detection, the recall rate and the false alarm rate generally show a positive correlation, namely if the positive class recall rate is detected in a lifting way (meaning that more abnormal samples are identified), the false alarm rate of the negative class is increased (more normal behaviors are misjudged as abnormal). Therefore, the problem of recall rate and false alarm rate needs to be comprehensively considered in reality, and the design and realization difficulty of the fusion detection technical scheme is further improved. Secondly, the fusion detection model is incomplete, and the fusion detection model is proposed for the first time in the existing research, but only two situations are involved, namely (1) fusion detection, wherein attack motivation detection is added from abnormal behavior users, and if the attack motivation is abnormal at the same time, an alarm is given; (2) if the user detection has a significant attack motivation, the abnormal detection sensitivity of the detection is properly adjusted to improve the recall rate, but the detection can only be used as a technical thought to provide reference, and no technical scheme for realizing the detection is provided. In fact, the above scheme omits other important situations, namely (3) if the user does not have significant attack motivation, the anomaly detection sensitivity should be properly adjusted to reduce the false alarm rate.

As described above, the high threshold of the fusion technical scheme design and the imperfection of the existing fusion model together lead to that the internal threat fusion detection only stays in the theoretical model stage, and the lack of a mature and feasible fusion technical scheme is used for guiding the application of the actual internal threat detection; eventually, the fusion detection research progresses slowly, and the urgent need of strengthening the internal threat defense capability of various industries of the national society cannot be met. How to assist in improving objective detection capability by quantifying the level of challenge motivation is a key issue of research.

Disclosure of Invention

In order to overcome the defect of internal motivation and external behavior analysis and cleavage in the existing internal threat detection research, the invention provides a complete and feasible internal user abnormal behavior fusion detection method based on user binary characteristic analysis, thereby practically improving the existing internal threat detection capability and practicality.

The invention also provides an internal user abnormal behavior fusion detection system based on the user binary analysis.

In order to achieve the above objective, the data base of the present invention needs to be based on application scenes of small and medium enterprises, and the user groups of the target scenes are respectively and simultaneously collected and constructed with the metadata feature and the behavior feature data set; further, a user abnormal behavior detection method based on the binary analysis is provided: (1) based on the user element characteristics and the behavior characteristics, respectively and independently training a machine learning model classifier, and implementing user element characteristic abnormal individual detection and user abnormal behavior detection; (2) based on the results of the user element feature classifier and the abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions; (3) for abnormal behaviors of users with abnormal meta-characteristics, directly alarming; (4) for the normal behavior of the normal user of the meta-feature, judging that the meta-feature is normal and not processed; (5) for normal behaviors of a meta-feature abnormal user, a behavior deviation threshold is properly regulated and controlled according to the degree of meta-feature abnormality, and the abnormal behavior recognition capability is improved; (6) and for abnormal behaviors of normal users with meta-features, properly regulating and controlling a behavior deviation threshold according to the normal degree of the meta-features, and reducing the false alarm quantity of abnormal detection.

The main idea of the invention is as follows: based on application scenes of small and medium enterprises, the binary characteristics of the users are synchronously collected and used as a data basis for subsequent analysis and detection of the invention; furthermore, a binary analysis fusion detection method for properly adjusting the boundary threshold of behavior detection classification by means of the characteristic level of the quantization element is provided.

According to the internal user abnormal behavior fusion detection method based on the user binary analysis, researchers/safety analysts can customize, select and train binary feature classifiers according to actual scene requirements, fusion detection advantages are fully exerted, and the accuracy and the practicability of the existing internal user abnormal behavior detection are effectively improved through deep analysis of complex essential correlations among user binary features.

Term interpretation:

1. the single-class support vector machine (OCSVM) is an expansion improvement of the traditional support vector machine (Support Vector Machine). In the traditional application scene, training data are simultaneously classified into a plurality of categories, so that the multi-category data can be mapped to a high-dimensional linear space, and the most suitable hyperplane is found to serve as a category boundary, so that the required SVM classifier is obtained; however, in practice, only single-class data can be obtained, for example, in a scene of detecting abnormal behaviors of a user, the normal behavior data of the user is the easiest to obtain, and the attack data is relatively difficult to obtain, so that an appropriate hypersphere mapped into a high-dimensional space can be trained as a class boundary based on the normal behavior data of the user, and samples outside the hypersphere are regarded as abnormal behavior data if the samples are regarded as normal behaviors inside the hypersphere, so that an OCSVM classifier is obtained. Currently, there is a third party callable module library (e.g., sklearn module library) that is mature and available for either SVM or OCSVM.

2. The Sklearn is a very powerful machine learning library provided by a Python third party and comprises functional function interfaces required by various aspects from data preprocessing to model training, sample evaluation and the like, and particularly a module library of a common machine learning algorithm model and a feature processing method such as SVM/OCSVM/PCA is provided, and a sklearn related library module (for example, the Sklearn.svm.OneClassSVM module is called for analyzing input features to train the OCSVM model) can be directly called when actual needs exist. The calling method can greatly save the time and the code quantity for writing codes and use more energy for data analysis and classification model optimization. In practice, the sklearn module library may be invoked by installing an anaconda toolkit, whose online download address is https:// www.anaconda.com/.

3、F ₁ /F _0.5 /F ₂ Score: f (F) ₁ Score (F) ₁ Score), which is a measure of the accuracy of a two-class model in statistics. The method and the device simultaneously consider the accuracy and recall rate of the classification model. F (F) ₁ The score can be seen as a harmonic mean of the model accuracy and recall, with a maximum of 1 and a minimum of 0. In fact F ₁ Score is only F _β The specific form of the score, i.e., β=1.

F _β The general definition of the score is given by the following equation (3), where F ₂ Recall ratio recovery in score (β=2) is weighted higher than precision, and F _0.5 The accuracy in the (β=0.5) score is weighted higher than the recall.

Assuming that the positive class is P and the negative class is N in anomaly detection, the recall rate recovery=TP/(TP+FN), the precision rate is TP/(TP+FP), and the values are all in a real number range. TP represents the number of samples that are determined to be positive and are actually positive, FP represents the number of samples that are determined to be positive and are actually negative, and FN represents the number of samples that are determined to be negative. F (F) _β The formula for the calculation of (2) is as follows:

4. PCA method: also known as principal component analysis (Principal Component Analysis, PCA) is a statistical method that converts a set of variables that may have a correlation into a set of linearly uncorrelated variables by a positive-to-negative transformation, the converted set of variables being called the principal component. In the data analysis, the principal component analysis method is used, so that the feature dimension required by training can be greatly reduced on the basis of keeping original data information as much as possible, feature fusion and training efficiency improvement are realized, and if a sklearn. Composition. PCA module is called to execute principal component analysis on target data, a new feature set after transformation is obtained.

The technical scheme of the invention is as follows:

an internal user abnormal behavior fusion detection method based on user binary analysis comprises the following steps:

collecting user binary data, wherein the user binary data comprises user metadata and user behavior data, and the user metadata refers to individual characteristic data reflecting the internal attack tendency degree of a user;

based on the user element characteristics and the user behavior characteristics, respectively and independently training a machine learning model classifier, wherein the machine learning model classifier comprises a user element characteristic classifier model and a user abnormal behavior classifier model, and respectively carrying out user element characteristic abnormal individual detection and user abnormal behavior detection;

based on the detection results of the user element feature classifier and the abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions: for users with abnormal meta-characteristics and abnormal behavior characteristics, directly alarming; for the users with normal meta-characteristics and normal behavior characteristics, judging that the users are normal and do not process; for users with abnormal meta-characteristics and normal behavior characteristics, properly regulating and controlling a behavior deviation threshold according to the abnormal degree of the meta-characteristics, and improving the abnormal behavior recognition capability; for users with normal meta-characteristics and abnormal behavior characteristics, the behavior deviation threshold is properly regulated according to the normal degree of the meta-characteristics, and the false alarm number of abnormal detection is reduced.

Preferably, according to the present invention, the user metadata includes user intrinsic psychological metadata and work metadata; the intrinsic psychological metadata of the user refers to personality assessment score of the target user; the work metadata comprises daily work data of users, attendance performance data of users and dimension data of user organization relations; the daily work data of the user comprise user work text data, user work assessment data and work satisfaction data;

the user behavior data is derived from an internal user auditing system and is divided into five types according to behavior categories: system login/logout, network access, mail communication, document access, and external device use.

According to the invention, before training the machine learning model classifier, data acquisition, data feature extraction, data marking, training feature aggregation and normalization processing are sequentially carried out, and the method specifically comprises the following steps:

and (3) data acquisition: a month selected to be free of security event alarms is assumed to be a normal time segment, labeled T ₀ Select T ₀ The month with the security event alert thereafter is the verification time segment, labeled T ₁ The method comprises the steps of carrying out a first treatment on the surface of the Select T ₁ The following month is the detection time zone, marked T ₂ ；

And (3) data characteristic extraction: from T ₀ 、T ₁ And T is ₂ In three time sections, acquiring each user metadata according to the date to obtain corresponding { m_data } _u ^0,d }、{m_data _u ^1,d Sum { m_data } _u ^2,d }，m_data _u ^0,d Representing T ₀ Meta-feature, m_data, acquired by section user u on day d _u ^1,d Representing T ₁ Meta-feature, m_data, acquired by section user u on day d _u ^2,d Representing T ₂ Meta-features collected by section user u on day d;

data marking: for T ₀ The meta-feature and behavior feature of all users in the section are marked as negative, which indicates normal; t (T) ₁ In the time period, if a certain user triggers a safety event on a certain day, marking the user as positive class of the Japanese feature and the behavior feature, and otherwise marking as negative class;

training feature aggregation: will T ₀ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₀ , ^M Will T ₀ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₀ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₀ , ^M ＝{m_data _u ⁰ ^,d }，A_Feat ₀ , ^M ＝{a_data _u ^0,d }，d∈T ₀ The method comprises the steps of carrying out a first treatment on the surface of the Will T ₁ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₁ , ^M Will T ₁ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₁ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Which is a kind ofMiddle M_Feat ₁ , ^M ＝{m_data _u ^1,d }，A_Feat ₁ , ^M ＝{a_data _u ^1,d }，d∈T ₁ The method comprises the steps of carrying out a first treatment on the surface of the Will T ₂ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₂ , ^M Will T ₂ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₂ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₂ , ^M ＝{m_data _u ^2,d }，A_Feat ₂ , ^M ＝{a_data _u ^2,d }，d∈T ₂ ；

Normalization: for M_Feat ₀ , ^M And A_Feat ₀ , ^M Respectively normalized to obtain M_Feat ₀ , ^MM And A_Feat ₀ , ^MM So that each column of values is controlled to be 0,1]Within the range; for M_Feat ₁ , ^M And A_Feat ₁ , ^M Respectively normalized to obtain M_Feat ₁ , ^MM And A_Feat ₁ , ^MM So that each column of values is controlled to be 0,1]Within the range; for M_Feat ₂ , ^M And A_Feat ₂ , ^M Respectively normalized to obtain M_Feat ₂ , ^MM And A_Feat ₂ , ^MM So that each column of values is controlled to be 0,1]Within the range.

Further preferably, in the normalization process, the normalization formula is as shown in formula (I):

x _mm ＝(x-X _max )/(X _max -X _min ) (I)

wherein X is _max And X _min Representing the maximum and minimum values, respectively, in the sequence { x }, which means M_Feat ₀ , ^M 、A_Feat ₀ , ^M 、M_Feat ₁ , ^M 、A_Feat ₁ , ^M 、M_Feat ₂ , ^M 、A_Feat ₂ , ^M A sequence of values consisting of each column vector; x is x _mm Finally uniformly adjusting the numerical value to be 0,1]Between them.

According to a preferred embodiment of the present invention, training a user meta-feature classifier model includes:

using M_Feat ₀ , ^MM Training a single class classifier for verification set M_Feat ₁ , ^MM Meta-feature m_data of any day d of any user u _u ^1,d Detecting; by comparison of T ₁ And (3) marking whether all user element characteristics on the section are normal or not, adjusting and obtaining user element characteristic Classifier parameters by taking optimal accuracy as a target, and finally determining a user element characteristic Classifier model M_Classification suitable for the detection at the present time ₂ The accuracy is the percentage of the number of samples with correct positive and negative classes in all samples judged by the user element feature classifier.

According to the invention, the training of the user abnormal behavior classifier model and the detection of the user abnormal behavior are preferably carried out, comprising the following steps:

for training set A_Feat ₀ , ^MM Constructing a feature tree from daily audit behavior data of a user, and executing an independent abnormality detection process; and based on validation set A_Feat ₁ , ^MM Adjusting to obtain initial optimal behavior shift threshold K by taking optimal accuracy as target _A And other user abnormal behavior classifier model parameters.

Further preferably, the single-class classifier refers to a single-class support vector machine OCSVM.

According to the invention, the method for detecting the abnormal individual of the user element characteristics by the trained user element characteristic classifier model comprises the following steps:

for target segment T ₂ The meta-feature of any day d of any user u in the above is adopted by a user meta-feature Classifier model M_classifier ₂ Detecting to obtain a required classification result, including:

(1) meta-feature m_data of any user u on any day d _u ^2,d Judging the classification mark, normally [ -1 ]]Or an abnormality of [ +1]；

(2) User meta-feature Classifier model M_Classification corresponding to meta-features ₂ Decision function set d_value of (a) ₂ ＝{d _i }，i＝1,2,...N ₂ ，N ₂ Representing T ₂ The number of users to be analyzed is increased; decision function, namely model M_Cl of user element feature classifier assifier ₂ A distance function of the target sample distance existing single-type sample model for making the judgment;

(3) quantifying user attack propensity level: for a single-class support vector machine (OCSVM) realized by adopting sklearn library functions, for the obtained decision function set D_value ₂ ＝{d _i Obtaining a quantized attack tendency level sequence M_value by taking the opposite numbers ₂ ＝{X _i }，i＝1,2,...N ₂ So that the value of the quantized attack sequence is positively related to the attack tendency, namely if the quantized attack tendency level corresponding to the user A and the user B is X _A >X _B It is determined that the attack tendency of user a is stronger than that of user B.

According to the invention, based on the detection results of the user element feature classifier and the abnormal behavior classifier, a user binary result matrix is established, and the user binary result matrix is respectively analyzed and processed according to different conditions, and the method comprises the following steps:

establishing a user binary result matrix: the user meta-characteristic classifier model and the user abnormal behavior classifier model are acted on the target detection set T ₂ The meta-feature and the behavior feature of any user in the section on any day d are combined, and the obtained judging result is four kinds of combinations as follows: firstly, the meta-feature is abnormal and the behavior feature is abnormal; secondly, the meta-feature is normal and the behavior feature is normal; thirdly, the meta-characteristic is abnormal but the behavior characteristic is normal; fourth, the meta-feature is normal but the behavioral feature is abnormal;

And (3) fast judging: for the sample of any day d with normal meta-characteristics and normal behavior characteristics, judging that the safety risk is lower, and classifying the sample into a normal class; for the sample of any day d with abnormal meta-characteristics and abnormal behavior characteristics, the sample shows that the attack tendency is higher, and the abnormal behavior is also shown, so that the sample is judged to be of an abnormal type and an alarm is triggered;

the abnormal behavior detection capability is improved by means of the abnormal meta-characteristics: for samples of any day d with abnormal meta-characteristics and abnormal behavior characteristics, the sensitivity of the behavior shift threshold is properly improved by adopting a formula (I);

in the formula (I), x represents M_value corresponding to a user ₂ In (1) the quantization attack propensity level, K _S/A Representing the fusion-adjusted abnormality detection behavior shift threshold, FNR ₁ Representing detection of abnormal behavior before fusion in the user standard set A_Feat ₁ , ^MM Class miss rate, alpha E (0, 1)]Representing an elasticity coefficient for adjusting the threshold reduction level based on the training set and the validation set;

the false negative rate fnr=fn/(tp+fn), where FN represents the number of samples determined to be normal but actually abnormal, and TP represents the number of samples determined to be abnormal and actually abnormal;

obtaining the optimal K _S/A After that, execute the process for A_Feat ₂ , ^MM Fusing the binary analysis of the attack behavior with abnormal behavior detection, and finally judging the date with the attack behavior and alarming;

Reducing abnormal behavior detection false alarm rate by means of normal meta-characteristics: for users with normal meta-characteristics but abnormal behavioral characteristics, the sensitivity of the behavioral shift threshold is properly reduced by adopting the formula (II):

in the formula (II), x represents M_value corresponding to a user ₂ In (3) the level of propensity to quantitative attack, K' _S/A Represents the abnormality detection behavior shift threshold after fusion adjustment, FPR ₁ Representing that the detection of abnormal behavior is performed before fusion in A_Feat ₁ , ^MM Class false positive rate, beta.E (0, 1)]Representing an elasticity coefficient for adjusting the threshold reduction level based on the training set and the validation set;

false positive rate fpr=fp/(tn+fp), where FP represents the number of samples that are abnormal in judgment but normal in reality, and TN represents the number of samples that are normal in judgment and true normal;

execute against A_Feat ₂ , ^MM The binary analysis of (1) fuses abnormal behavior detection, and finally judges the date with the attack behavior and alarms.

Further preferably, the optimal K is obtained _S/A After that, execute the process for A_Feat ₂ , ^MM The binary analysis of (1) fuses abnormal behavior detection, finally judges the date with the attack behavior and alarms, and specifically comprises the following steps:

(4) forming a row vector matrix by the normal behavior feature row vector set of the user and the daily behavior feature row vector to be analyzed;

(5) after normalization by column, each element of the last row is compared one by one to see if it exceeds K of the column _S/A And if the range exceeds the range, indicating that the characteristic offset corresponding to the column of elements is abnormal, and alarming.

Further preferably, the method for obtaining the elastic coefficient α includes:

(6) setting an initial alpha=0.01, substituting the initial alpha=0.01 into the formula (I) to calculate and obtain a corresponding post-fusion behavior shift threshold K _S/A ；

(7) Using post-fusion threshold K _S/A Re-pair A_Feat ₁ , ^MM Detecting abnormal behaviors to obtain a fused abnormal behavior detection result;

(8) calculating F for post-fusion abnormal behavior detection ₂ A score;

(9) every time alpha increases by 0.01 until the upper limit value 1 is increased, F corresponding to all alpha is counted ₂ Score, select F ₂ Substituting alpha with highest score as optimal selected value into formula (I) to obtain optimal K _S/A 。

Further preferably, the method for obtaining the elastic modulus β includes:

is set to initial beta=0.01, and is substituted into formula (II) to calculate and obtain corresponding post-fusion behavior shift threshold K' _S/A ；

Using post-fusion threshold K' _S/A Re-pair A_Feat ₁ , ^MM Abnormal behavior detection is carried out on the corresponding user behaviors to obtain a fused abnormal behavior detection result;

calculating F for post-fusion abnormal behavior detection _0.5 A score;

every time beta increases by 0.01 until the upper limit value 1 is increased, F corresponding to all beta is counted _0.5 Score, select F _0.5 Substituting beta with highest score as optimal selected value into formula (II) to obtain optimal K' _S/A 。

A computer device comprising a memory storing a computer program and a processor implementing the steps of an internal user abnormal behavior fusion detection method based on user binary analysis when the computer program is executed.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of an internal user abnormal behavior fusion detection method based on user binary analysis.

An internal user abnormal behavior fusion detection system based on user binary analysis, comprising:

the user binary data acquisition module is configured to: collecting user binary data, wherein the user binary data comprises user metadata and user behavior data, and the user metadata refers to individual characteristic data reflecting the internal attack tendency degree of a user;

the user element characteristic classifier model training and user element characteristic abnormal individual detection module is configured to: training a user element feature classifier model based on the user element features, and detecting abnormal individuals of the user element features;

the user abnormal behavior classifier model training and user abnormal behavior detection module is configured to: training a user abnormal behavior classifier model based on the user behavior characteristics, and detecting user abnormal behaviors;

A fusion detection analysis module configured to: based on the detection results of the user element feature classifier and the abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions: for users with abnormal meta-characteristics and abnormal behavior characteristics, directly alarming; for the users with normal meta-characteristics and normal behavior characteristics, judging that the users are normal and do not process; for users with abnormal meta-characteristics and normal behavior characteristics, properly regulating and controlling a behavior deviation threshold according to the abnormal degree of the meta-characteristics, and improving the abnormal behavior recognition capability; for users with normal meta-characteristics and abnormal behavior characteristics, the behavior deviation threshold is properly regulated according to the normal degree of the meta-characteristics, and the false alarm number of abnormal detection is reduced.

The beneficial effects of the invention are as follows:

1. user fusion detection model completion: the existing fusion detection scheme not only stays in the theoretical model stage, but also omits the important combination situation of the binary characteristics of the user. Based on the relation among independent judgment results of the user binary feature dimension classifier, the invention provides a complete four-quadrant binary matrix: (1) the user element features are abnormal and the behavior features are abnormal; (2) the user element features are normal and the behavior features are normal; (3) the user element has abnormal characteristics but normal behaviors; (4) user element features are normal but behaving abnormally. Through the complete four-quadrant binary matrix, the combination situation of the judgment results of the binary classifier of the user can be analyzed without omission, the user groups are divided, and subsequent analysis and detection are executed in a targeted mode.

2. And (3) improving the abnormal detection recall rate based on user element feature analysis: the intrinsic association of the intrinsic meta-features of the user with the extrinsic attack behaviour is on the one hand manifested in that for users with significant attack incentives, a more strict and careful monitoring of behaviour should be given. However, if the abnormal behavior judgment threshold is directly lowered according to the level of the attack motivation (the abnormal behavior is judged if the general behavior deviation exceeds the threshold, and the lowered threshold means that the behavior with smaller deviation can be found), the abnormal recall rate is improved, and meanwhile, the false alarm quantity caused by the fluctuation of the normal behavior of the user is definitely increased, so that the accuracy of the whole detection is lowered. Therefore, the invention selects the behavior standard set aiming at different users and refers to the report missing rate and F of the standard set ₂ The score index optimizes and fuses the technical formula, and a feasible technical scheme for reasonably improving the abnormal detection recall rate based on the user element characteristics is realized.

3. Reducing the false alarm rate of anomaly detection based on user element feature analysis: the intrinsic correlation of the intrinsic meta-feature of the user with the extrinsic attack behavior is that the other negative side is that the monitoring conditions should be properly relaxed for users with weak attack motivations. However, if the abnormal behavior judgment threshold is directly raised according to the level of the attack motivation (the abnormal behavior judgment threshold is judged if the general behavior deviation exceeds the threshold, the raising threshold means that more normal behavior fluctuation cannot trigger an alarm), the real abnormal behavior of the user can be omitted while the normal false alarm rate is reduced, the number of the false alarm rate is increased, and the accuracy rate of the whole detection is reduced as a result. Therefore, the invention selects the behavior standard set aiming at different users and refers to the false alarm rate and F of the standard set _0.5 The score index is optimized and fused with a technical formula, so that a feasible technical scheme for reasonably reducing the false alarm rate of anomaly detection based on user element characteristics is realized.

4. According to the invention, through comprehensive binary fusion detection, the abnormal behavior of the user is finally judged, so that the accuracy and applicability of the existing internal threat fusion detection are effectively improved, the attack behavior of the user is accurately identified, and powerful technology and data support are finally provided for subsequent follow-up research and judgment of security analysts.

Drawings

FIG. 1 is a flow chart of an internal user abnormal behavior fusion detection method based on user binary analysis.

Detailed Description

The invention is further defined by, but is not limited to, the following drawings and examples in conjunction with the specification.

Example 1

An internal user abnormal behavior fusion detection method based on user binary analysis, as shown in fig. 1, comprises the following steps:

Example 2

The method for detecting fusion of abnormal behaviors of internal users based on binary analysis of users according to embodiment 1 is characterized in that:

the user metadata comprises intrinsic psychological metadata and work metadata of the user; the intrinsic psychological metadata of the user refers to personality assessment score of the target user; the work metadata comprises daily work data of users, attendance performance data of users and dimension data of user organization relations; the daily work data of the user comprise user work text data, user work assessment data and work satisfaction data;

Personality traits are psychological structures that trigger and dominate behavior, and can be used to describe how prone individuals keep relatively consistent behavior in different times and situations, so that the level of motivation for an internal user can be characterized by personality traits. The invention collects attack motivation data from two dimensions of a large five personality model or a dark personality model. The intrinsic psychological metadata of the user refers to the personality score of the target user, and in practice, it is recommended that the personality score of the target user be obtained in two ways: (1) inviting a target user to fill in an online questionnaire by means of open network questionnaire statistical analysis platforms such as questionnaires and the like, and automatically calculating to obtain five special scores; (2) by means of the professional consultation analysis mechanism, unified organization staff receives off-line personality assessment and obtains professional analysis feedback. As a subject of subjective data collection, specific scores of personality traits of the target staff, such as (percent) nervous 70 score, pleasant 80 score, disbelief 65 score, camber 73 score, openness 69 score and the like, are finally provided, and the higher the score is, the more tendency of the term is shown; as another example, the score may be 80, 91, and 59, where a higher score indicates a more pronounced trend.

Existing studies and attack case analysis indicate that users need to have basic three elements when they conduct internal attacks: (1) the system has certain attack motivations such as working attitude, working satisfaction and the like; (2) an attacker who has the capability of implementing attacks, such as executing internal system damage, must first be familiar with system security knowledge, have a certain computer level, and the like; (3) there is an opportunity to conduct an attack, such as when an attacker often chooses to conduct an attack at a time other than work, or when market department personnel use their own access rights to copy and steal important customer material, etc. Based on the above analysis, the present invention will attempt to collect work metadata from three sub-dimensions of the user's daily work, user attendance performance, and user organizational relationships.

Collecting user work text data: collecting corresponding language information from text data related to the user work, such as collecting text data typed by the user from work mails and work documents (such as business documents and work summary of the day) written by the user, extracting keyword frequency vectors reflecting individual emotion attitudes by means of LIWC tools (Linguistic inquiry and word count); subsequent analysis may further optimize the resulting LIWC word vector features, such as aggregating keyword classes based on the large five personality dimensions or refining feature representations based on the large five personality dimensions.

Collecting user work assessment data: the work performance evaluation is given daily by the department supervisor to which the user belongs, including two basic dimensions: (1) the working performance is evaluated by a main pipe, and the working attitude and the working progress condition of the main pipe are evaluated; (2) the collaboration performance is evaluated by the director, and the performance of the collaboration capability and the collaboration with the leader co-worker (e.g., whether the collaboration is a quarry contradiction between the leader co-worker and the like) is evaluated. The scores of the two indexes are set as five-point value standard: poor (1 minute), poor (2 minutes), general (3 minutes), good (4 minutes) and good (5 minutes), default unoccupied set to (-1). The supervisor should give work assessment to the employee in jurisdiction by date (day).

Collecting work satisfaction data: user work satisfaction is periodically (e.g., monthly/yearly) assessed in an online questionnaire format (e.g., questionnaire star satisfaction assessment) to obtain scores for seven important dimensions: the working year index (accumulated working month), the degree of satisfaction of the work itself (five-point value standard), the degree of satisfaction of the work return and development (five-point value standard), the degree of satisfaction of the lead management (five-point value standard), the degree of satisfaction of the working environment and the background (five-point value standard), the degree of satisfaction of the working relation (five-point value standard) and the degree of satisfaction of the whole enterprise (five-point value standard). The user's work satisfaction data remains unchanged during each cycle (month/year), and is updated only during the new cycle; the eight dimensions initialize a score of 0, defaults to-1.

Collecting attendance performance data: the attendance data of the users on duty and off duty are obtained through various modes such as fingerprint/face recognition card punch/software card punch (such as nailing and card punch), whether the users are on duty, tardy, early-moving, open work, false, overtime and the like are counted according to each working day, if the corresponding situation occurs, the users are set to 1, otherwise, the users are set to 0, and therefore six-dimensional attendance feature vectors can be counted and obtained according to each working day of each user.

Collecting user organization relation dimension data: the method is to distribute attribution numbers based on the relation of departments reflected by an organization tree structure, and the organization sets a unified position number for each working position. For example, in a certain university, the university is usually under jurisdiction, and the teaching post can be set as a teacher, a teaching (01) assistant (02), a coaching person (03) and the like; if a user is assumed to be a teacher (01) of a network safety system (01) of a network space safety institute (01) of a computing department (01), the position team coding data of the user is [01-01-01-01], [01-01-01-03] represents a coaching member to which the network safety system belongs. Corresponding position team codes are set for the user according to specific scenes, so that the position and department attribution information of the user can be reflected; in actual analysis, multidimensional features are established according to the organization relation hierarchy, for example, the organization relation features of a certain system of teachers are (01,01,01,01) four dimensions.

The user behavior data is derived from an internal user auditing system and is divided into five types according to behavior categories: system login/logout, network access (e.g., server address, domain name or web page text content), mail messaging (e.g., two-party email, mail subject, text content summary, attachment features), document access (read, modify, delete, rename, copy or move), and external device usage (USB or printer, scanner). In fact, the user behavior data adopted by the invention is determined by the actual application scene and the characteristics of the internal audit system, and the user behavior data can be flexibly customized and modified according to the actual situation, so long as the user element characteristics and the behavior characteristics which are digitized are provided as input, the application of the subsequent detection method of the invention is not affected.

In order to fully develop the advantages of binary analysis and fusion detection, the invention suggests that the ideal application scene should satisfy two basic conditions: (1) the method comprises the steps of constructing and operating a trusted network security protection and internal audit system, so that security threat events can be detected and basic data of user information system operation, such as command operation, website access, document access, equipment use, mailing and system login, can be collected, and a detailed attribute list can be customized by referring to the characteristic standard of a CERT data set or according to actual scenes, and is not described in detail; (2) the invention provides basic data acquisition conditions of a user attack motivation, such as recording the attendance performance of a user by a card punching instrument, obtaining the personality characteristics of the user and the scoring data of the working satisfaction degree by online evaluation, and implementing working daily report and evaluation system by an online OA system.

Example 3

before training the machine learning model classifier, sequentially performing data acquisition, data feature extraction, data marking, training feature aggregation and normalization processing, wherein the method specifically comprises the following steps:

and (3) data acquisition: selecting a no security eventThe month of the alarm is assumed to be the normal time segment, marked T ₀ (it is recommended that multiple network security defense systems be installed simultaneously to perform risk detection in order to increase the confidence of no security event alarms, such as 360 security guard, kabasky, norton, and velour security software, etc.); select T ₀ The month with the security event alert thereafter is the verification time segment, labeled T ₁ ；T ₀ And T is ₁ Together as a standard set for the subsequent fusion formula parameter determination. Select T ₁ The following month is the detection time zone, marked T ₂ The method comprises the steps of carrying out a first treatment on the surface of the (if a plurality of months need to be detected, the detection time is divided into independent detection time sections according to the months).

And (3) data characteristic extraction: from T ₀ 、T ₁ And T is ₂ Each user metadata (reflecting the hit tendency) is collected according to the date (particularly to every day) in three time sections to obtain corresponding { m_data } _u ^0,d }、{m_data _u ^1,d Sum { m_data } _u ^2,d }，m_data _u ^0,d Representing T ₀ Meta-feature, m_data, acquired by section user u on day d _u ^1,d Representing T ₁ Meta-feature, m_data, acquired by section user u on day d _u ^2,d Representing T ₂ Meta-features collected by section user u on day d;

in fact, the daily meta-feature or behavior feature of the user is obtained through the user meta-data or the user behavior data respectively, and the feature construction method can be automatically specified according to the analysis scene characteristics; however, for convenience of the following description, the present invention gives a general optional feature construction method as an example.

For user meta-features, the above-described related meta-data of the present invention may be collected and mapped into a set of digitized feature vectors: (1) evaluating the user line and recording the large five personality scores and the dark personality scores as the intrinsic psychological numerical characteristics; (2) collecting the data content of the work text typed by the user on the same day, and extracting frequency statistics corresponding to the keyword class by means of a LIWC tool as text content characteristics; (3) calculating the current day work performance and the cooperative performance scores according to the evaluation of the user affiliated manager; (4) taking the scores of key questions in the work satisfaction questionnaire on the current month user line as numeric features; (5) the user's daily attendance performance (whether late, early, absent, false, if present, set a value of 1 on the corresponding bit, otherwise set 0); (6) the organization structure code to which the user belongs on the current day. The six sub-features can be spliced together to form the current day meta-feature of the user, and can be further processed to obtain the high-level abstract meta-feature, which is not described in detail in the invention.

For the user behavior characteristics, the dimension range of the basic characteristics is more consistent, and various behaviors and the times of the operation objects on the same day are recorded basically in a statistical mode, such as the times of the user accessing the external domain name on the same day, the times of the user deleting the file on the same day, the times of the user connecting the USB equipment on the same day, the times of the user communicating with the external electronic mailbox on the same day and the like; the method can select interesting and representative behavior category statistics according to scene characteristics, can directly splice all behavior statistics counts and analyze the statistics, and can further process the statistics to obtain high-level abstract behavior characteristics, and the method is not described in detail.

Data marking: for T ₀ The meta-features and behavior features of all users in the section are marked as negative classes, which represent normal (-1); t (T) ₁ In the time period, if a certain user triggers a security event on a certain day, marking the current yen characteristics and behavior characteristics of the user as positive type (+1), and otherwise marking the current yen characteristics and behavior characteristics as negative type (-1);

training feature aggregation: to by means of T ₀ Establishing a normal standard baseline for user metadata and behavior data of the segment, thereby generating a standard baseline for T ₁ Or T ₂ Abnormality detection of two dimensions of meta-feature and behavior feature is carried out on a section, and T is adopted in the invention ₀ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₀ , ^M Will T ₀ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₀ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₀ , ^M ＝{m_data _u ^0,d }，A_Feat ₀ , ^M ＝{a_data _u ^0,d }，d∈T ₀ The method comprises the steps of carrying out a first treatment on the surface of the Will T ₁ The user element feature set of a segment is considered to be an integral featureCondition line vector M_Feat ₁ , ^M Will T ₁ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₁ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₁ , ^M ＝{m_data _u ^1,d }，A_Feat ₁ , ^M ＝{a_data _u ^1,d }，d∈T ₁ The method comprises the steps of carrying out a first treatment on the surface of the Will T ₂ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₂ , ^M Will T ₂ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₂ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₂ , ^M ＝{m_data _u ^2,d }，A_Feat ₂ , ^M ＝{a_data _u ^2,d }，d∈T ₂ ；

Normalization: to reduce the subsequent model training error, M_Feat is calculated ₀ , ^M And A_Feat ₀ , ^M Respectively normalized to obtain M_Feat ₀ , ^MM And A_Feat ₀ , ^MM So that each column of values is controlled to be 0,1]Within the range; for M_Feat ₁ , ^M And A_Feat ₁ , ^M Respectively normalized to obtain M_Feat ₁ , ^MM And A_Feat ₁ , ^MM So that each column of values is controlled to be 0,1]Within the range; for M_Feat ₂ , ^M And A_Feat ₂ , ^M Respectively normalized to obtain M_Feat ₂ , ^MM And A_Feat ₂ , ^MM So that each column of values is controlled to be 0,1]Within the range.

In the normalization process, a normalization formula is shown as a formula (I):

x _mm ＝(x-X _max )/(X _max -X _min ) (I)

wherein X is _max And X _min Representing the maximum and minimum values, respectively, in the sequence { x }, which means M_Feat ₀ , ^M 、A_Feat ₀ , ^M 、M_Feat ₁ , ^M 、A_Feat ₁ , ^M 、M_Feat ₂ , ^M 、A_Feat ₂ , ^M A sequence of values consisting of each column vector; the essence is thatThe feature line vector matrix is required to be normalized separately, and the method is that the value of each column is calculated according to the maximum value and the minimum value of the column. For example, selecting M_Feat ₀ , ^M Comparing and calculating the maximum value and the minimum value of the data in the column, and then normalizing any data in the column to obtain M_Feat ₀ , ^MM The rest feature row matrix processing methods are similar. X is x _mm Finally uniformly adjusting the numerical value to be 0,1]Between them.

Example 4

training a user meta-feature classifier model, comprising:

using M_Feat ₀ , ^MM Training a single class classifier for verification set M_Feat ₁ , ^MM Meta-feature m_data of any day d of any user u _u ^1,d Detecting; by comparison of T ₁ And (3) marking whether all user element characteristics on the section are normal or not (+1 or-1), adjusting and obtaining user element characteristic Classifier parameters by taking optimal accuracy as a target, and finally determining a user element characteristic Classifier model M_Classifier suitable for the current detection ₂ The accuracy rate is the percentage of the number of correct samples of positive class (+1) and negative class (-1) in all samples determined by the user element feature classifier.

Training a user abnormal behavior classifier model and detecting user abnormal behaviors, including:

the existing research provides a rich method for detecting the abnormal behavior of the user, and the method can be flexibly customized and selected according to scene characteristics, detection requirements and the like in practice; considering the scheme description and experimental verification requirements, the invention adopts the method in documents Yang Guang, etc. A General and Expandable Insider Threat Detection System Using Baseline Anomaly Detection and Scenario-driven Alarm Filters,2018, and aims at the training set A_Feat ₀ , ^MM Constructing a feature tree from daily audit behavior data of a user, and executing an independent abnormality detection process; and based on validation set A_Feat ₁ , ^MM ToThe optimal accuracy rate is adjusted for the target to obtain an initial optimal behavior shift threshold K _A And other user abnormal behavior classifier model parameters. The specific implementation process is as follows:

(1) based on five elements of a main body, time, place, mode and object of the behavior, a multi-scene user behavior tree is established to comprehensively describe the user behavior;

(2) extracting the characteristic of an offset mode according to different behavior domains, wherein the method is to compare the difference between a new behavior five-element sequence and an existing five-element sequence, record the changed element as 'D', and record the same as 'S', so that an offset mode similar to 'S-S-S-S-D' can be obtained, and the offset mode indicates that only a behavior object is changed;

(3) According to three time periods before work, during work and after work, respectively extracting the occurrence times of all the offset modes under different behavior domains according to the method, and taking the occurrence times as the behavior offset mode characteristics of the user on the same day;

(4) the training set assumes that the user behavior shift pattern follows a standard normal distribution, thus setting an initial behavior shift threshold k=2;

(5) taking the sample to be detected as the last row of the row vector matrix, so that the problem becomes to analyze whether the offset of the last row relative to all the previous row vectors exceeds a set offset threshold;

(6) performing abnormal behavior detection on the verification set by using an offset threshold K=2, and adjusting and calculating according to the false alarm rate to obtain an optimized offset threshold K _A And adopt K _A Performing abnormal behavior detection in a subsequent time period; if the behavior shift threshold exceeds K in the following day _A A corresponding alarm is triggered.

The single-class classifier refers to a single-class support vector machine (OCSVM). Researchers/analysts may also flexibly select or customize other single-class classifiers themselves.

The method for detecting the abnormal individual of the user element characteristics through the trained user element characteristic classifier model comprises the following steps:

(2) User meta-feature Classifier model M_Classification corresponding to meta-features ₂ Decision function set d_value of (a) ₂ ＝{d _i }，i＝1,2,...N ₂ ，N ₂ Representing T ₂ The number of users to be analyzed is increased; decision function, namely user element feature Classifier model M_Classifier ₂ A distance function of the target sample distance existing single-type sample model for making the judgment; taking OCSVM as an example, the training process obtains the hypersphere model of the normal user metadata, and the farther the decision function is, i.e. the distance between the sample to be detected and the hypersphere, the larger the deviation from the existing normal model is, so that the more likely it is to determine the abnormality.

Based on the detection results of the user element feature classifier and the abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out aiming at different conditions, and the method comprises the following steps:

establishing a user binary result matrix: for T ₀ Segment training set and T ₁ The invention trains the user element characteristic classifier and the user abnormal behavior detection classifier independently, thereby acting the user element characteristic classifier model and the user abnormal behavior classifier model on the target detection set T ₂ The meta-feature and the behavior feature of any user in the section on any day d are combined, and the obtained judging result is four kinds of combinations as follows: first, theFirstly, the meta characteristic is abnormal and the behavior characteristic is abnormal; secondly, the meta-feature is normal and the behavior feature is normal; thirdly, the meta-characteristic is abnormal but the behavior characteristic is normal; fourth, the meta-feature is normal but the behavioral feature is abnormal; specifically as shown in table 1:

TABLE 1

Feature class determination	Normal user behavior characteristics (-1)	User behavior feature anomalies (+1)
			Normal user element characteristics (-1)	-1,-1	-1,+1
User element feature anomaly (+1)	+1,-1	+1,+1

And (3) fast judging: for the sample of any day d with normal meta-characteristics and normal behavior characteristics, judging that the safety risk is lower, and classifying the sample into a normal class; for the sample of any day d with abnormal meta-characteristics and abnormal behavior characteristics, the sample shows that the attack tendency is higher, and the abnormal behavior is also shown, so that the sample is judged to be of an abnormal type and an alarm is triggered; and performing subsequent control and judgment treatment.

The abnormal behavior detection capability is improved by means of the abnormal meta-characteristics: for the samples of any day d with abnormal meta-characteristics and abnormal behavior characteristics, the attack tendency is obviously higher than that of the whole samples, but the abnormal behavior is hidden, so that more strict abnormal behavior detection standards are adopted, namely the sensitivity of the behavior deviation threshold is properly improved by adopting the formula (I);

in the formula (I), x represents M_value corresponding to a user ₂ In (1) the quantization attack propensity level, K _S/A Representing the fusion-adjusted abnormality detection behavior shift threshold, FNR ₁ Representing detection of abnormal behavior before fusion in the user standard set A_Feat ₁ , ^MM The upper (+ 1) category miss rate, alpha E (0, 1)]Representing an elasticity coefficient for adjusting the threshold reduction level based on the training set and the validation set;

obtaining the optimal K _S/A After that, execute the process for A_Feat ₂ , ^MM Fusing the binary analysis of the (1) with abnormal behavior detection, finally judging the date (specific to the day) with the attack behavior and alarming; since the abnormal behavior detection is based on the offset distance between the target behavior and the existing model, if the distance is greater than K _S/A Judging an abnormal alarm; and K can be obtained by using the formula (I) _S/A Suitably less than or equal to K _A The threshold value threshold for judging abnormal behaviors is reasonably reduced, so that more hidden attack behaviors can be found.

Reducing abnormal behavior detection false alarm rate by means of normal meta-characteristics: for users whose meta-features are normal but whose behavioral features are abnormal, this shows a significantly lower tendency to attack than for the whole, so they should take careful and relaxed abnormal behavior detection criteria, i.e. the sensitivity to appropriately reduce the behavior shift threshold using formula (II):

in the formula (II), x represents M_value corresponding to a user ₂ Is a quantification of the propensity for attack,K’ _S/A represents the abnormality detection behavior shift threshold after fusion adjustment, FPR ₁ Representing that the detection of abnormal behavior is performed before fusion in A_Feat ₁ , ^MM Class (-1) false positive rate, β ε (0, 1)]Representing an elasticity coefficient for adjusting the threshold reduction level based on the training set and the validation set;

execute against A_Feat ₂ , ^MM The binary analysis of (1) fuses abnormal behavior detection, finally judges the date (specific to the day) with the attack behavior and alarms. Since the abnormal behavior detection is based on the offset distance between the target behavior and the existing model, if the distance is greater than K' _S/A Judging an abnormal alarm; and K 'can be obtained by using the formula (II)' _S/A Suitably greater than or equal to K _A The threshold value threshold for judging abnormal behaviors is actually raised, so that the number of false positives caused by fluctuation of normal behaviors of a user can be reduced.

Obtaining the optimal K _S/A After that, execute the process for A_Feat ₂ , ^MM The binary analysis of (1) fuses abnormal behavior detection, finally judges the date with the attack behavior and alarms, and specifically comprises the following steps:

(4) forming a row vector matrix by the normal behavior feature row vector set of the user and the daily behavior feature row vector to be analyzed; the follow-up task is to analyze the offset degree of the row vector to be analyzed of the last row relative to the row vector set;

The method for obtaining the elastic coefficient alpha comprises the following steps:

it is worth mentioning that the elastic coefficient alpha of the present invention is not specified by artificial design, but needs to be combined with F ₂ The score index traversal parameters are obtained, (6) initial alpha=0.01 is set, and the score index traversal parameters are substituted into formula (I) to calculate and obtain corresponding fused behavior deviation threshold K _S/A ；

(8) calculating F for post-fusion abnormal behavior detection ₂ A score;

The method for obtaining the elastic coefficient beta comprises the following steps:

the elastic coefficient beta of the invention also needs to be combined with F _0.5 The score index traversal parameters are obtained, initial beta=0.01 is set for users with normal or abnormal behavior of any element, and the parameters are substituted into formula (II) to calculate and obtain corresponding fused behavior deviation threshold K '' _S/A ；

calculating F for post-fusion abnormal behavior detection _0.5 A score;

In order to verify the effectiveness of the fusion detection method of the abnormal user behaviors based on the binary analysis of the user, the invention performs a verification analysis experiment based on the CERT of the internal threat detection data set disclosed in the prior art. Computer operation behavior records of 1000 users are provided in CERT5.2, wherein four typical internal threat scenarios of secret disclosure of a wiki website, copying of confidential data before departure are simulated.

Considering that the CERT data set does not provide all the user element characteristics mentioned by the invention, the user element characteristic collection, construction and classifier experiment are simplified to a certain extent, and the user element characteristic categories of specific analysis are as follows: (1) a large five personality score (CERT dataset offer) for the user; (2) the organization relation data of the user, such as CERT5.2, is provided with five levels of business-functional-partition-team-role, and different categories can be represented by adopting two-digit codes for each level by adopting the method of the invention; (3) and (3) the time of working up and working down of staff in the same team is statistically analyzed, the exclusive time of working up and working down of the team is extracted, and the attendance related characteristics are determined by combining with the log/logo records of a user system.

To facilitate training validation and detection experiments, the CERT dataset is divided into T's in time ₀ 、T ₁ And T ₂ Three time periods. Wherein T is ₀ No attack in time period, T ₁ Time period and T ₂ The time periods all have attack behaviors. According to the method, the user element characteristics and the behavior characteristics are respectively extracted, the user element characteristics are detected, the user behavior characteristics are detected, and then the abnormal behavior detection offset threshold value of the user is dynamically adjusted according to the quantified attack trend variable obtained by the user element characteristics.

Finally, three experiments were performed to analyze the comparison using 10 rounds of cross-validation based on the CERT5.2 dataset. The method for detecting the abnormal behavior of the user in the documents Yang Guang, etc. A General and Expandable Insider Threat Detection System Using Baseline Anomaly Detection and Scenario-driven Alarm Filters and 2018 is repeated in the experiment, and user element characteristic auxiliary analysis is not used; the experiment II adopts the abnormal behavior fusion detection method for the binary analysis of the user; experiment three adopts a direct fusion method for processing user binary characteristics by directly using PCA method, and uses OCSVM to carry out T by using the obtained fusion characteristics ₂ TimeAnd detecting segment data. The Recall Rate (Recall Rate) and false alarm Rate (False Positive Rate) were used as the evaluation index, and the results are shown in table 2:

TABLE 2

Experiment number	Recall (100%)	False alarm rate (100%)
			Experiment one	86.2％	21.3％
Experiment two	95.6％	10.4％
			Experiment three	92.3％	15.2％

As can be seen from table 2, compared with the first reference experiment, the third experiment in which the user binary features are directly fused is adopted, so that the recall rate is improved, the false alarm rate is reduced, and the result of the first experiment is improved; experiments II show that by adopting the abnormal behavior fusion detection method for user binary analysis, the recall rate can be further improved, the false alarm rate can be reduced, and the result of the experiment I can be obviously improved.

The experiment shows that the abnormal behavior fusion detection method based on the user binary analysis effectively improves the capability of the system to find out more concealed abnormal behaviors, and simultaneously obviously reduces the false alarm rate aiming at the threshold adjustment of normal users.

Example 5

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of embodiments 1-4 of the method for internal user anomaly behavior fusion detection based on user binary analysis when the computer program is executed.

Example 6

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of embodiments 1-4 of an internal user abnormal behavior fusion detection method based on user binary analysis.

Example 7

Claims

1. The internal user abnormal behavior fusion detection method based on the user binary analysis is characterized by comprising the following steps of:

based on the detection results of the user meta-feature classifier and the user abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions: for users with abnormal meta-characteristics and abnormal behavior characteristics, directly alarming; for users with normal meta-characteristics and normal behavior characteristics, judging that the users are normal, and not processing the users; for users with abnormal meta-characteristics and normal behavior characteristics, properly regulating and controlling a behavior deviation threshold according to the abnormal degree of the meta-characteristics, and improving the abnormal behavior recognition capability; for users with normal meta-characteristics and abnormal behavior characteristics, properly regulating and controlling the behavior deviation threshold according to the normal degree of the meta-characteristics, and reducing the false alarm quantity of abnormal detection;

before training a machine learning model classifier, sequentially performing data acquisition, data feature extraction, data marking, training feature aggregation and normalization processing, and specifically comprising the following steps:

training feature aggregation: will T ₀ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₀ , ^M Will T ₀ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₀ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₀ , ^M ＝{m_data _u ^0,d }，A_Feat ₀ , ^M ＝{a_data _u ^0,d }，d∈T ₀ The method comprises the steps of carrying out a first treatment on the surface of the Will T ₁ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₁ , ^M Will T ₁ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₁ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₁ , ^M ＝{m_data _u ^1,d }，A_Feat ₁ , ^M ＝{a_data _u ^1,d }，d∈T ₁ The method comprises the steps of carrying out a first treatment on the surface of the Will T ₂ The user element feature set of a segment is considered to be the global feature row vector M_Feat ₂ , ^M Will T ₂ The user behavior feature set of a segment is considered to be the global feature row vector a_feat ₂ , ^M The method comprises the steps of carrying out a first treatment on the surface of the Wherein M_Feat ₂ , ^M ＝{m_data _u ^2,d }，A_Feat ₂ , ^M ＝{a_data _u ^2,d }，d∈T ₂ ；

Normalization: for M_Feat ₀ , ^M And A_Feat ₀ , ^M Respectively normalized to obtain M_Feat ₀ , ^MM And A_Feat ₀ , ^MM So that each column of values is controlled to be 0,1]Within the range; for M_Feat ₁ , ^M And A_Feat ₁ , ^M Respectively normalized to obtain M_Feat ₁ , ^MM And A_Feat ₁ , ^MM So that each column of values is controlled to be 0,1]Within the range; for M_Feat ₂ , ^M And A_Feat ₂ , ^M Respectively normalized to obtain M_Feat ₂ , ^MM And A_Feat ₂ , ^MM So that each column of values is controlled to be 0,1]Within the range;

2. The method for detecting the fusion of abnormal behaviors of an internal user based on binary analysis of the user according to claim 1, wherein in the normalization process, a normalization formula is shown as formula (I):

x _mm ＝(x-X _max )/(X _max -X _min ) (I)

3. The method for detecting the fusion of abnormal behaviors of an internal user based on binary analysis of the user according to claim 1, wherein training the user meta-feature classifier model comprises the following steps:

using M_Feat ₀ , ^MM Training a single class classifier through the validation set M_Feat ₁ , ^MM Meta-feature m_data for any day d of any user u _u ^1,d Detecting; by comparison of T ₁ And (3) marking whether all user element characteristics on the section are normal or not, adjusting and obtaining user element characteristic Classifier parameters by taking optimal accuracy as a target, and finally determining a user element characteristic Classifier model M_Classification suitable for the detection at the present time ₂ The accuracy is the percentage of the number of samples with correct positive and negative classes in all samples judged by the user element feature classifier.

4. A method for internal user abnormal behavior fusion detection based on user binary analysis according to claim 3, wherein training a user abnormal behavior classifier model and performing user abnormal behavior detection comprises:

5. The method for detecting the fusion of abnormal behaviors of an internal user based on binary analysis of the user according to claim 4, wherein the single-class classifier is a single-class support vector machine (OCSVM).

6. The method for detecting the fusion of abnormal behaviors of an internal user based on binary analysis of the user according to claim 4, wherein the individual detection of abnormal characteristics of the user is performed through a trained classifier model of characteristics of the user, comprising:

using a user element characteristic Classifier model M_Classifier ₂ For the target section T ₂ Detecting the meta-feature of any day d of any user u to obtain a required classification result, wherein the meta-feature comprises the following steps:

(2) User meta-feature Classifier model M_Classification corresponding to meta-features ₂ Decision function set d_value of (a) ₂ ＝{d _i }，i＝1,2,...N ₂ ，N ₂ Representing T ₂ The number of users to be analyzed is increased; decision function, namely user element feature Classifier model M_Classifier ₂ A distance function of the target sample distance existing single-type sample model for making the judgment;

7. The method for detecting the fusion of the abnormal behaviors of the internal user based on the binary analysis of the user according to claim 4, wherein the method for detecting the abnormal behaviors of the internal user based on the binary analysis of the user according to the detection results of the user meta-feature classifier and the abnormal behaviors of the user, establishes a binary result matrix of the user, and respectively analyzes and processes the binary result matrix of the user according to different conditions comprises the following steps:

the abnormal behavior detection capability is improved by means of the abnormal meta-characteristics: for the samples of any day d with abnormal meta-characteristics and abnormal behavior characteristics, adopting a formula (II) to properly improve the sensitivity of the behavior shift threshold;

in the formula (II), x represents M_value corresponding to a user ₂ In (1) quantization attack trend level, M_value ₂ Representing a sequence of quantized attack propensity levels; k (K) _S/A Representing the fusion-adjusted abnormality detection behavior shift threshold, FNR ₁ Representing detection of abnormal behavior before fusion in the user standard set A_Feat ₁ , ^MM Class miss rate, alpha E (0, 1)]Representing an elasticity coefficient for adjusting the threshold reduction level based on the training set and the validation set; k (K) _A Representing an initial optimal behavior shift threshold;

reducing abnormal behavior detection false alarm rate by means of normal meta-characteristics: for users with normal meta-features but abnormal behavioral features, the sensitivity of the behavioral shift threshold is suitably reduced by using formula (III):

in the formula (III), x represents M_value corresponding to a user ₂ In (3) the level of propensity to quantitative attack, K' _S/A Represents the abnormality detection behavior shift threshold after fusion adjustment, FPR ₁ Representing that the detection of abnormal behavior is performed before fusion in A_Feat ₁ , ^MM Class false positive rate, beta.E (0, 1)]Representing an elasticity coefficient for adjusting the threshold reduction level based on the training set and the validation set;

8. The method for detecting internal user abnormal behavior fusion based on user binary analysis according to claim 7, wherein the optimal K is obtained _S/A After that, execute the process for A_Feat ₂ , ^MM The binary analysis of (1) fuses abnormal behavior detection, finally judges the date with the attack behavior and alarms, and specifically comprises the following steps:

9. The method for detecting the fusion of abnormal behaviors of an internal user based on binary analysis of the user according to claim 7, wherein the method for obtaining the elastic coefficient α comprises the following steps:

(6) setting an initial alpha=0.01, substituting the initial alpha=0.01 into the formula (II) to calculate and obtain a corresponding post-fusion behavior shift threshold K _S/A ；

(8) calculating F for post-fusion abnormal behavior detection ₂ A score;

(9) every time alpha increases by 0.01 until the upper limit value 1 is increased, F corresponding to all alpha is counted ₂ Score, select F ₂ Substituting alpha at the highest score as the optimal selected value into formula (II) to obtain optimal K _S/A 。

10. The method for detecting the fusion of abnormal behaviors of an internal user based on binary analysis of the user according to claim 7, wherein the method for obtaining the elastic coefficient β comprises the following steps:

Is set to initial beta=0.01, and is substituted into formula (III) to calculate corresponding post-fusion behavior shift threshold K' _S/A ；

calculating F for post-fusion abnormal behavior detection _0.5 A score;

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the internal user abnormal behavior fusion detection method based on user binary analysis according to any one of claims 1-7.

12. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the internal user anomaly fusion detection method based on user binary analysis of any one of claims 1 to 10.

13. An internal user abnormal behavior fusion detection system based on user binary analysis, which is characterized by comprising:

a fusion detection analysis module configured to: based on the detection results of the user meta-feature classifier and the user abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions: for users with abnormal meta-characteristics and abnormal behavior characteristics, directly alarming; for the users with normal meta-characteristics and normal behavior characteristics, judging that the users are normal and do not process; for users with abnormal meta-characteristics and normal behavior characteristics, properly regulating and controlling a behavior deviation threshold according to the abnormal degree of the meta-characteristics, and improving the abnormal behavior recognition capability; for users with normal meta-characteristics and abnormal behavior characteristics, properly regulating and controlling the behavior deviation threshold according to the normal degree of the meta-characteristics, and reducing the false alarm quantity of abnormal detection;

based on the detection results of the user element feature classifier and the abnormal behavior classifier, a user binary result matrix is established, and analysis and processing are respectively carried out on different conditions: for users with abnormal meta-characteristics and abnormal behavior characteristics, directly alarming; for users with normal meta-characteristics and normal behavior characteristics, judging that the users are normal, and not processing the users; for users with abnormal meta-characteristics and normal behavior characteristics, properly regulating and controlling a behavior deviation threshold according to the abnormal degree of the meta-characteristics, and improving the abnormal behavior recognition capability; for users with normal meta-characteristics and abnormal behavior characteristics, properly regulating and controlling the behavior deviation threshold according to the normal degree of the meta-characteristics, and reducing the false alarm quantity of abnormal detection;

And (3) data characteristic extraction: from T ₀ 、T ₁ And T is ₂ In three time sections, acquiring each user metadata according to the date to obtain corresponding { m_data } _u ^0,d }、{m_data _u ^1,d Sum { m }_data _u ^2,d }，m_data _u ^0,d Representing T ₀ Meta-feature, m_data, acquired by section user u on day d _u ^1,d Representing T ₁ Meta-feature, m_data, acquired by section user u on day d _u ^2,d Representing T ₂ Meta-features collected by section user u on day d;