Disclosure of Invention
In order to solve the problems, the invention provides an abnormal behavior detection method and device based on user behavior analysis. The user behavior data is collected in real time through a proper time sliding window, the user behavior data is input into a data analysis model, and the conclusion whether the user behavior is normal or not is given through the analysis and judgment of the model so as to enable a user to make subsequent judgment.
More specifically, the invention provides an abnormal behavior detection method based on user behavior analysis. Wherein, the method comprises the following steps;
firstly, setting initial time sliding window parameters for user behavior data acquisition, acquiring the user behavior data in real time based on the set time sliding window parameters, and representing the user behavior data through a multidimensional array;
secondly, constructing a standard user behavior sample library which comprises a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library; the system comprises a standard malicious user behavior sample library, a standard normal user behavior sample library, a suspicious user behavior sample library and a user behavior analysis module, wherein the standard malicious user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to malicious behaviors, the standard normal user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to normal behaviors, and the suspicious user behavior sample library stores a plurality of pieces of user behavior data for representing that user behaviors belong to suspicious behaviors;
thirdly, constructing a multi-angle behavior analysis model by using a standard user behavior sample library;
fourthly, inputting the collected user behavior data into the constructed multi-angle behavior analysis model to obtain an analysis conclusion value array;
and fifthly, calculating to obtain an analysis credibility value based on the analysis conclusion value array, comparing the analysis credibility value with an acceptable credibility value preset by a user, and selecting to execute feedback adjustment and retest according to a comparison result.
Preferably, the time sliding window parameter includes a length of the time sliding window and an interval of the basic time element for data acquisition, and at least one of the length of the time sliding window and the interval of the basic time element for data acquisition can be adjusted when the time sliding window parameter is feedback-adjusted;
preferably, the analysis conclusion value array includes an abnormal probability value, a normal probability value and a suspicious probability value, wherein the abnormal probability value represents a similarity degree between the user behavior data collected in real time currently and the samples in the standard malicious user behavior sample library, the normal probability value represents a similarity degree between the user behavior data collected in real time currently and the samples in the standard normal user behavior sample library, and the suspicious probability value represents a similarity degree between the user behavior data collected in real time currently and the samples in the suspicious user behavior sample library;
preferably, the multi-angle behavior analysis model comprises a mahalanobis distance-based multi-angle behavior analysis model and an isolated forest-based multi-angle behavior analysis model.
Preferably, in the multi-angle behavior analysis model based on mahalanobis distance, mahalanobis distances between the user behavior data currently acquired in real time and samples in a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library are respectively calculated to obtain an analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value;
preferably, the multi-angle behavior analysis model based on mahalanobis distance selects the minimum value of the abnormal probability value, the normal probability value and the suspicious probability value, calculates the ratio of the minimum value to the other two values respectively, and uses the two ratios as the analysis reliability value, only when the two ratios are both smaller than the acceptable reliability value preset by the user, the analysis result is determined to be reliable, and the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed.
Preferably, in the multi-angle behavior analysis model based on the isolated forest, a malicious behavior forest, a normal behavior forest and a suspicious behavior forest are respectively constructed through a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library, the average path distances from root nodes of user behavior data currently acquired in real time in the malicious behavior forest, the normal behavior forest and the suspicious behavior forest to leaf nodes where the user behavior data are located are respectively calculated, and the analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value is obtained based on the average path distances.
Preferably, the maximum value of the abnormal probability value, the normal probability value and the suspicious probability value is selected from the multi-angle behavior analysis model based on the isolated forest, the ratio of the maximum value to the other two values is respectively calculated, the two ratios are used as analysis reliability values, only when the two ratios are both larger than an acceptable reliability value preset by a user, the analysis result is determined to be reliable, the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed.
In addition, the present invention further provides an abnormal behavior detection apparatus based on user behavior analysis, wherein the apparatus includes:
the parameter setting module is used for setting initial time sliding window parameters for user behavior data acquisition;
the data acquisition module is used for acquiring the user behavior data in real time based on the time sliding window parameter set by the parameter setting module and expressing the user behavior data through a multidimensional array;
the behavior sample library construction module is used for constructing a standard user behavior sample library, and comprises a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library; the system comprises a standard malicious user behavior sample library, a standard normal user behavior sample library, a suspicious user behavior sample library and a user behavior analysis module, wherein the standard malicious user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to malicious behaviors, the standard normal user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to normal behaviors, and the suspicious user behavior sample library stores a plurality of pieces of user behavior data for representing that user behaviors belong to suspicious behaviors;
the model construction module is used for constructing a multi-angle behavior analysis model by using a standard user behavior sample library;
the analysis module is used for inputting the collected user behavior data into the constructed multi-angle behavior analysis model to obtain an analysis conclusion value array;
and the result feedback module is used for calculating to obtain an analysis credibility value based on the analysis conclusion value array, comparing the analysis credibility value with an acceptable credibility value preset by a user, and selecting to execute feedback adjustment and retest according to the comparison result.
Preferably, the time sliding window parameter set by the parameter setting module includes a length of the time sliding window and an interval of the basic time element for data acquisition, and at least one of the length of the time sliding window and the interval of the basic time element for data acquisition can be adjusted when the time sliding window parameter is feedback-adjusted;
preferably, the analysis conclusion value array obtained by the analysis module through analysis includes an abnormal probability value, a normal probability value and a suspicious probability value, wherein the abnormal probability value represents the similarity between the user behavior data acquired in real time currently and the samples in the standard malicious user behavior sample library, the normal probability value represents the similarity between the user behavior data acquired in real time currently and the samples in the standard normal user behavior sample library, and the suspicious probability value represents the similarity between the user behavior data acquired in real time currently and the samples in the suspicious user behavior sample library;
preferably, the multi-angle behavior analysis model constructed by the model construction module comprises a multi-angle behavior analysis model based on mahalanobis distance and a multi-angle behavior analysis module based on isolated forests.
Preferably, in the multi-angle behavior analysis model based on mahalanobis distance, mahalanobis distances between the user behavior data currently acquired in real time and samples in the standard malicious user behavior sample library, the standard normal user behavior sample library and the suspicious user behavior sample library are respectively calculated to obtain the analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value.
Preferably, the multi-angle behavior analysis model based on mahalanobis distance selects the minimum value of the abnormal probability value, the normal probability value and the suspicious probability value, calculates the ratio of the minimum value to the other two values respectively, and uses the two ratios as the analysis reliability value, only when the two ratios are both smaller than the acceptable reliability value preset by the user, the analysis result is determined to be reliable, and the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed.
Preferably, a malicious behavior forest, a normal behavior forest and a suspicious behavior forest are respectively constructed in the multi-angle behavior analysis model based on the isolated forest through a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library, the average path distances from root nodes of user behavior data currently acquired in real time in the malicious behavior forest, the normal behavior forest and the suspicious behavior forest to leaf nodes where the user behavior data are located are respectively calculated, and an analysis conclusion value array comprising the abnormal probability value, the normal probability value and the suspicious probability value is obtained based on the average path distances;
preferably, the maximum value of the abnormal probability value, the normal probability value and the suspicious probability value is selected from the multi-angle behavior analysis model based on the isolated forest, the ratio of the maximum value to the other two values is respectively calculated, the two ratios are used as analysis reliability values, only when the two ratios are both larger than an acceptable reliability value preset by a user, the analysis result is determined to be reliable, the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed.
The invention collects the user behavior data in real time through the sliding time window which can be fed back and adjusted, describes and reflects the behavior data of the user from a plurality of angles by constructing a multi-angle behavior library and a multi-angle behavior analysis model, analyzes and compares the analysis result data set with the acceptable confidence value set by the user, and feeds back and adjusts the parameters of the time window for data collection and updates the behavior library according to the analysis and comparison structure, thereby obtaining the comprehensive analysis and accurate judgment and obtaining the analysis result through the calculation complexity as small as possible on the premise of improving the detection accuracy.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
The embodiment of the invention provides an abnormal behavior detection method based on user behavior analysis, which is shown in fig. 1 and comprises the following steps:
step 101, setting initial time sliding window parameters for user behavior data acquisition, acquiring user behavior data in real time based on the set time sliding window parameters, and representing the user behavior data through a multidimensional array;
preferably, in step 101, the time sliding window parameters include the length of the time sliding window and the interval of the basic time element of data acquisition;
feature selection is an important "data preprocessing" process in the detection of user behavior models. That is, it is important to select which user behavior data to identify the user. In real-world tasks, excessive attributes can cause dimension disasters, and removing irrelevant features tends to reduce the difficulty of learning tasks. The selection of the "bad" features may result in too long operation time, and even the normal user and the abnormal user may not be correctly distinguished. In this embodiment, the parameters of the time sliding window may be adjusted according to the analysis result, that is, the length of the acquired time sequence, that is, the number of attributes,
in the embodiment, the use habits naturally generated by different users when using the host are utilized to model the user behaviors. The user behavior is characterized by the input habits of the user, including information such as mouse clicking frequency, mouse running track, mouse wheel frequency, keyboard clicking frequency, keyboard special symbol use frequency and the like. Considering that even the inherent habits of the same user in different scenarios may change somewhat, for example, the frequency of keyboard strokes may increase while performing document work. This requires distinguishing the scenes by detecting the use of the application.
The operation behaviors of the user are isolated from the time, the mouse is clicked at the time t1, the keyboard is clicked at the time t2, and a continuous depiction is needed to describe the behavior habits of the user, so that the relevance among the behaviors is mined. In this embodiment, a sliding window is used to perform a certain combination on the behavior sequence. The sequence width is selected based on a time interval, operation behaviors in the latest time period are counted at intervals, a moving event of the whole sequence is triggered when the counted time is reached, the behavior record in the first time period is moved out of the queue every time, a new behavior record is added at the end of the sequence, the time width is marked as W, namely the length of a time sliding window, and the time length of interval sampling is also the interval of data acquisition basic time elements. For example, the time width W =10s is selected, and the statistics are performed every 1 second, so that the implementation of the behavior sequence can be performed by using a multidimensional array which is 10 in the dimension of time, and the whole sequence is triggered to move backwards by 1 bit every 1 second. By means of the design, the associated information in the behavior sequence can be effectively mined, and a specific schematic diagram can be shown in fig. 2.
102, constructing a standard user behavior sample library, including a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library; the system comprises a standard malicious user behavior sample library, a standard normal user behavior sample library, a suspicious user behavior sample library and a user behavior analysis module, wherein the standard malicious user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to malicious behaviors, the standard normal user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to normal behaviors, and the suspicious user behavior sample library stores a plurality of pieces of user behavior data for representing that user behaviors belong to suspicious behaviors;
daily users of a part of hosts and servers may be multiple, and behavior habits of the users may be different, but the users are legal users. Or the like, the same legitimate user may have different usage habits at different times, such as being on a phone call, talking to a person, etc. In order to solve the problem, a normal user behavior habit set needs to be formed in the learning process, normal users are contained in the set, new user behavior habits are compared with the normal user behavior set in the detection process, and emergency response is carried out when abnormality is found.
Furthermore, both malicious user behaviors and suspicious user behaviors, i.e. user behaviors that cannot be determined whether the behavior is malicious or not and need further judgment, have their own behavior characteristics, for example, a user behavior that steals user data or may have a different expression value from normal operation on the frequency of clicking a mouse, or on the sliding track of the mouse and the clicked folder, the positions clicked in sequence, etc., these typical malicious and suspicious user behaviors are also classified into behavior libraries, respectively, and whether the behavior is normal or not is accurately judged by the similarity between the user behavior data and these different behavior libraries.
Furthermore, in this embodiment, the samples in the standard user behavior sample library constructed in this step should have the same dimension as the user behavior data acquired in real time, that is, if the user behavior data has ten dimensions, for example, if the user behavior data is a vector array expressed as 1 × 10 after being acquired through a sliding time window, the samples in the sample library should also have a corresponding dimension number, so as to enable subsequent analysis and determination.
103, constructing a multi-angle behavior analysis model by using a standard user behavior sample library;
after a multi-angle, i.e. normal, abnormal and suspicious behavior library is constructed, a multi-angle behavior analysis model can be correspondingly established, compared with the single-angle analysis method in the prior art, the multi-angle behavior analysis module of the embodiment does not judge whether the user behavior is normal or abnormal from a single angle, but analyzes and compares the user behavior data with the standard normal behavior data, the standard abnormal behavior data and the suspicious behavior data one by one, judges the classification of the user behavior data from multiple angles in multiple aspects, for example, if the current user behavior data is the normal user behavior data, the behavior data has a greater similarity degree with the samples in the standard normal user behavior sample library, and the behavior data has a lower similarity degree with the standard malicious user behavior sample library and the suspicious user behavior sample library, this allows the categorization of the user behavior data to be judged from three aspects.
Preferably, in step 103, the multi-angle behavior analysis model includes a mahalanobis distance-based multi-angle behavior analysis model and an isolated forest-based multi-angle behavior analysis module.
Preferably, in the multi-angle behavior analysis model based on mahalanobis distance, mahalanobis distances between the user behavior data currently acquired in real time and samples in the standard malicious user behavior sample library, the standard normal user behavior sample library and the suspicious user behavior sample library are respectively calculated to obtain the analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value.
Mahalanobis distance (Mahalanobis distance) represents the covariance distance of the data. The method is an effective method for calculating the similarity between two unknown sample sets. The essence is to utilize Cholesky transform to deal with the problem of correlation between different dimensions and different measurement scales.
The training set is assumed to be a matrix T, that is, a plurality of samples in the standard sample library form a matrix T, the row vector of which represents a record and the column vector of which represents an attribute. The covariance matrix is noted as:
using Cholesky decomposition, it can be converted into the product of the lower triangular matrix and the upper triangular matrix:
the correlation and the difference of the measurement scale between different dimensions can be eliminated by processing the real-time data x of the user behavior data as follows:
wherein
Is the mean of the training set T.
For real-time user behavior data x to be analyzed, the Euclidean distance is
Then, the corresponding mahalanobis distance M after processing can be expressed as:
by calculating the Mahalanobis distance between the current user behavior data and the existing sample library set, the statistical characteristics of the Mahalanobis distances obtained by different test sets can be examined, and thus the detection of the abnormity is completed. That is, mahalanobis distances between the user behavior data collected in real time and the samples in the standard malicious user behavior sample library, the standard normal user behavior sample library and the suspicious user behavior sample library are respectively calculated to obtain the analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value.
Preferably, in the multi-angle behavior analysis model based on the isolated Forest (iForest), a malicious behavior Forest, a normal behavior Forest and a suspicious behavior Forest are respectively constructed through a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library, and average path distances from root nodes to leaf nodes where the user behavior data are located in the malicious behavior Forest, the normal behavior Forest and the suspicious behavior Forest of the user behavior data collected in real time at present are respectively calculated, and the analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value is obtained based on the average path distances.
Regarding the isolated forest algorithm, generally speaking, the behaviors of abnormal users are rare, most of the behaviors should be operations of normal users, and the abnormal behavior pattern should be significantly different from the normal behavior pattern, that is, in the super-dimensional space formed by a plurality of sample points, normal data points should be significantly distinguished from abnormal data points in terms of position relation, for example, all normal data points are gathered together, and abnormal data points are scattered around the normal data points. The isolated forest algorithm utilizes the two characteristics, and the core concept can be described as follows in popular language: the method comprises the steps of cutting a super-dimensional space in which a plurality of sample points are located by constructing a plurality of super-dimensional planes, so that all the sample points are separated by the super-dimensional planes, namely, no two sample points are located in the same space, and for abnormal sample points, the abnormal sample points can be separated by the super-dimensional planes easily and quickly due to the fact that the abnormal sample points are scattered around normal sample points and have obvious difference in position relation with the normal sample points, and the normal sample points which are gathered together need to be separated by the super-dimensional planes for multiple times to be separated one by one.
In a specific algorithm operation, a series of random binary trees are constructed on data of various dimensions for a plurality of samples in a sample set in an isolated forest algorithm by means of binary trees, each node of the random binary trees has either two children or is a leaf node, and one child does not exist. The data in the range is divided into two branches by randomly taking values in the value range, and then the two branches continue to take values randomly for division, and the steps are repeated continuously until the tree height is limited or the tree height is not divided. Since the abnormal points are rare and can be quickly divided into the leaf nodes in the random tree, whether a record is abnormal or not can be quickly judged by calculating the path length from the leaf node to the root node. To reduce the computational effort, the bounding height of the random tree may be computed, and nodes that exceed the average path length are generally considered to be free of anomalies.
For the data of n samples, the path length is denoted as h (n), and the average path length c (n) is:
c(n)= 2H(n − 1) − (2(n − 1)/n)
where H (i) is the harmonic number, equal to ln (i) + Euler constant.
For the normalization of the path length, let s (x, n) be the anomaly index:
in the formula, E (h (x)) is the expectation of the path length of a given value, and it can be seen that s (x, n) is the normalization of the path corresponding to the value. When s (x, n) approaches 1, it is abnormal, when s (x, n) is far less than 0.5, it is normal, and when all points are near 0.5, it means that there is no obvious abnormality for all points.
In this embodiment, a malicious behavior forest, a normal behavior forest and a suspicious behavior forest are respectively constructed for three types of sample libraries, that is, a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library, then, average path distances from root nodes of user behavior data currently acquired in real time in the malicious behavior forest, the normal behavior forest and the suspicious behavior forest to leaf nodes where the user behavior data is located are respectively calculated, and an analysis conclusion value array including an abnormal probability value, a normal probability value and a suspicious probability value is obtained based on the average path distances.
For the multi-angle analysis method of the embodiment, the similarity between the data to be detected and the standard normal behavior, the standard abnormal behavior and the standard suspicious behavior can be calculated respectively, which has higher accuracy compared with the judgment in the prior art only through a single angle, for example, if the user behavior data is normal behavior data, the user behavior data should have the expression characteristics similar to those of the samples in the standard normal user behavior sample library, but have the expression characteristics opposite to those of the samples in the standard malicious user behavior sample library, that is, the user behavior data is normal data in a normal behavior forest and is difficult to distinguish, and further has a longer path distance; the data is abnormal data in the malicious behavior forest, and is easy to distinguish, and further, the path distance is short. Similar characteristics are also true for other properties of user behavior data.
The detection method of the isolated forest is suitable for the condition that the queue length is large, the false alarm rate is low, and the detection result of the whole set is also credible. The algorithm can achieve linear time complexity with low storage overhead, can process high-dimensional data and mass data, and can also present good results in a scene without exception.
Step 104, inputting the collected user behavior data into the constructed multi-angle behavior analysis model to obtain an analysis conclusion value array;
preferably, in step 104, the analysis conclusion value array includes an abnormal probability value, a normal probability value and a suspicious probability value, where the abnormal probability value represents a similarity between the current real-time acquired user behavior data and samples in the standard malicious user behavior sample library, the normal probability value represents a similarity between the current real-time acquired user behavior data and samples in the standard normal user behavior sample library, and the suspicious probability value represents a similarity between the current real-time acquired user behavior data and samples in the suspicious user behavior sample library;
in this step, the user behavior data is input into the multi-angle behavior analysis model, for example, as mentioned above, the analysis result in the form of an array may be obtained in the two analysis models, where the array includes three elements, and the numerical values of the three elements respectively indicate the similarity degree of the behavior data with the standard malicious user behavior, the standard normal user behavior, and the suspicious user behavior, for example, if the user behavior data is a piece of normal user behavior data, it should show a numerical value more similar to the standard normal user behavior, for example, in an interval of 0 to 1, the numerical value is shown as above 0.75, and the numerical value is not similar to the standard malicious user behavior and the suspicious user behavior, for example, the numerical value is shown as below 0.25.
And 105, calculating to obtain an analysis reliability value based on the analysis conclusion value array, comparing the analysis reliability value with an acceptable reliability value preset by a user, and selecting to execute feedback adjustment and retest according to a comparison result, wherein the steps comprise the steps of feeding back and adjusting time sliding window parameters to reacquire user behavior data, and executing the steps 101 to 104 again, or giving an analysis result of the user behavior data acquired currently in real time according to the comparison result.
In this step, an acceptable confidence value may be preset by the user, and the value is a set threshold value that is substantially used by the user to determine whether the current detection result is authentic or not, and when the analysis confidence value calculated based on the analysis conclusion value array is greater than or less than the acceptable confidence value, the current search result may be considered authentic or not, and a determination of retesting or directly outputting the result may be made.
In the step, if a multi-angle behavior analysis model based on the Mahalanobis distance is adopted, the minimum value of the abnormal probability value, the normal probability value and the suspicious probability value which are obtained through calculation is selected, the ratio of the minimum value to other two values is calculated respectively, the two ratios are used as analysis reliability values, only when the two ratios are smaller than an acceptable reliability value preset by a user, the analysis result is determined to be reliable, the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed. For example, if the analysis conclusion value array obtained by calculation is [ 0.200.850.90 ], selecting the minimum of 0.2, calculating 0.2/0.85 and 0.2/0.9 to obtain an analysis confidence value, comparing the two values with an acceptable confidence value preset by a user, such as 0.25, and if both values are smaller than the acceptable confidence value, determining that the detection result is reliable and outputting the analysis result; otherwise, executing the steps of feedback adjustment and retesting.
In the step, if a multi-angle behavior analysis model based on an isolated forest is adopted, selecting the maximum one of the abnormal probability value, the normal probability value and the suspicious probability value obtained by calculation, calculating the ratio of the maximum one to other two values respectively, taking the two ratios as an analysis reliability value, confirming that the analysis result is reliable and outputting the analysis result only when the two ratios are greater than an acceptable reliability value preset by a user, otherwise, confirming that the analysis result is not reliable, and executing the steps of feedback adjustment and retesting.
Preferably, in the feedback adjustment and retesting steps, at least one of the length of the time sliding window and the data acquisition basic time interval may be adjusted when the feedback adjustment is performed on the time sliding window parameter.
On the other hand, corresponding to the method proposed by the above-mentioned embodiment, referring to fig. 3, the present embodiment also proposes an abnormal behavior detection apparatus 300 based on user behavior analysis, wherein the apparatus 300 includes:
a parameter setting module 301, configured to set an initial time sliding window parameter for user behavior data acquisition;
the data acquisition module 302 is used for acquiring the user behavior data in real time based on the time sliding window parameter set by the parameter setting module and representing the user behavior data by a multidimensional array;
a behavior sample library construction module 303, configured to construct a standard user behavior sample library, where the standard user behavior sample library includes a standard malicious user behavior sample library, a standard normal user behavior sample library, and a suspicious user behavior sample library; the system comprises a standard malicious user behavior sample library, a standard normal user behavior sample library, a suspicious user behavior sample library and a user behavior analysis module, wherein the standard malicious user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to malicious behaviors, the standard normal user behavior sample library stores a plurality of pieces of standard user behavior data for representing that user behaviors belong to normal behaviors, and the suspicious user behavior sample library stores a plurality of pieces of user behavior data for representing that user behaviors belong to suspicious behaviors;
a model construction module 304, configured to construct a multi-angle behavior analysis model from a standard user behavior sample library;
an analysis module 305, configured to input the collected user behavior data into the constructed multi-angle behavior analysis model to obtain an analysis conclusion value array;
and the result feedback module 306 is used for calculating an analysis reliability value based on the analysis conclusion value array, comparing the analysis reliability value with an acceptable reliability value preset by a user, and selecting to execute feedback adjustment and retest according to the comparison result, and comprises the steps of feeding back and adjusting time sliding window parameters to reacquire user behavior data, executing reacquiring data and analyzing through the parameter setting module, the data acquisition module, the behavior sample base construction module, the model construction module and the analysis module, or giving the analysis result of the user behavior data acquired currently in real time according to the comparison result.
Preferably, the time sliding window parameter set by the parameter setting module 301 includes a length of the time sliding window and an interval of the basic time element for data acquisition, and at least one of the length of the time sliding window and the interval of the basic time element for data acquisition may be adjusted when the time sliding window parameter is feedback-adjusted;
preferably, the analysis conclusion value array obtained by the analysis module 305 through analysis includes an abnormal probability value, a normal probability value and a suspicious probability value, where the abnormal probability value represents a similarity degree between the current real-time acquired user behavior data and a sample in the standard malicious user behavior sample library, the normal probability value represents a similarity degree between the current real-time acquired user behavior data and a sample in the standard normal user behavior sample library, and the suspicious probability value represents a similarity degree between the current real-time acquired user behavior data and a sample in the suspicious user behavior sample library;
preferably, the multi-angle behavior analysis model constructed by the model construction module 304 comprises a mahalanobis distance-based multi-angle behavior analysis model and an isolated forest-based multi-angle behavior analysis module.
Preferably, in the multi-angle behavior analysis model based on mahalanobis distance, mahalanobis distances between the user behavior data currently acquired in real time and samples in the standard malicious user behavior sample library, the standard normal user behavior sample library and the suspicious user behavior sample library are respectively calculated to obtain the analysis conclusion value array including the abnormal probability value, the normal probability value and the suspicious probability value.
Preferably, the multi-angle behavior analysis model based on mahalanobis distance selects the minimum value of the abnormal probability value, the normal probability value and the suspicious probability value, calculates the ratio of the minimum value to the other two values respectively, and uses the two ratios as the analysis reliability value, only when the two ratios are both smaller than the acceptable reliability value preset by the user, the analysis result is determined to be reliable, and the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed.
Preferably, a malicious behavior forest, a normal behavior forest and a suspicious behavior forest are respectively constructed in the multi-angle behavior analysis model based on the isolated forest through a standard malicious user behavior sample library, a standard normal user behavior sample library and a suspicious user behavior sample library, the average path distances from root nodes of user behavior data currently acquired in real time in the malicious behavior forest, the normal behavior forest and the suspicious behavior forest to leaf nodes where the user behavior data are located are respectively calculated, and an analysis conclusion value array comprising the abnormal probability value, the normal probability value and the suspicious probability value is obtained based on the average path distances;
preferably, the maximum value of the abnormal probability value, the normal probability value and the suspicious probability value is selected from the multi-angle behavior analysis model based on the isolated forest, the ratio of the maximum value to the other two values is respectively calculated, the two ratios are used as analysis reliability values, only when the two ratios are both larger than an acceptable reliability value preset by a user, the analysis result is determined to be reliable, the analysis result is output, otherwise, the analysis result is determined to be unreliable, and the steps of feedback adjustment and retesting are executed.
The method adopts the feedback-adjustable sliding time window to collect the user behavior data in real time, constructs the multi-angle behavior library and the multi-angle behavior analysis model, analyzes the behavior data of the user from multiple angles, obtains all-round analysis and accurate judgment, and can obtain an analysis result through the calculation complexity as small as possible on the premise of improving the detection accuracy.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.