WO2020038353A1

WO2020038353A1 - Abnormal behavior detection method and system

Info

Publication number: WO2020038353A1
Application number: PCT/CN2019/101543
Authority: WO
Inventors: 李达; 吴睿; 万晓川
Original assignee: 瀚思安信（北京）软件技术有限公司
Priority date: 2018-08-21
Filing date: 2019-08-20
Publication date: 2020-02-27

Abstract

Disclosed are an abnormal behavior detection method and system. Unsupervised learning clustering is carried out on behavior feature sets extracted from different data sources, a result cluster obtained by means of clustering is analyzed to mark abnormal clusters, supervised learning is carried out on the abnormal clusters to generate a detection model having a high degree of adaptation and accuracy and corresponding to a specific application scenario, and abnormal behavior detection of a data set in a production environment is realized by means of the detection model obtained through said learning. The embodiments of the present invention can integrate behavior data from different data sources, and can improve the accuracy of automatic recognition of abnormal behaviors and significantly reduce the labor cost.

Description

Method and system for detecting abnormal behavior

Technical field

The present invention relates to the field of information security, and in particular, to a method and system for detecting abnormal behaviors, used to detect abnormal behaviors in an enterprise information system, and to protect the services, data, services, and assets of an enterprise.

Background technique

At present, the field of information security is facing various challenges. On the one hand, the security architecture of enterprises is becoming more and more complex, and various types of security equipment and security data are increasing. Traditional analysis capabilities are obviously inadequate. The emergence of a new type of threat, the deepening of internal control and compliance, more and more need to store and analyze more security information, and make decisions and respond more quickly.

In the past, it took days or even months to understand hard-to-detect security threats, because a large number of disparate data streams made it difficult to form concise and organized event “puzzles”. The larger the amount of data collected and analyzed, the more chaotic it looks and the longer it takes to reconstruct the event. This usually requires professionals to perform a lot of manual analysis with the help of professional knowledge, which is time-consuming and labor-intensive. With the research and application of machine learning and classification algorithms, the industry has begun to study how to use machine learning and classification algorithms to replace manual analysis to achieve automatic analysis and detection of abnormal behavior by machines. An existing method is to train a classifier through training sample learning to distinguish abnormal behavior data from normal behavior data, but this method needs to manually label a large amount of sample data, especially with different data sources in the enterprise security architecture The amount of security data is constantly increasing, and the labeling of massive data from different data sources requires a lot of labor costs, which cannot meet the actual needs of current enterprise information security analysis.

Therefore, how to provide a set of data that integrates different data sources, apply the most advanced machine learning algorithms, artificial intelligence detection engines, integrate the knowledge of security experts to automatically identify abnormal behaviors, and automatically integrate many abnormal units, so that operations and maintenance personnel can understand and The anomalous scenario explained has become an urgent need for enterprises with large information systems.

Summary of the Invention

The technical problem to be solved by the present invention is how to provide an abnormal behavior detection method and system capable of integrating data from different data sources, which can improve the accuracy of automatic identification of abnormal behavior and reduce labor costs.

In order to solve the above technical problem, according to an aspect of the present invention, a method for detecting abnormal behavior is provided, which includes the following steps:

Extract behavioral feature sets from many different types of data sources;

Clustering the behavior feature set based on an unsupervised learning algorithm;

Analyze the result clusters obtained by clustering, and mark abnormal clusters corresponding to specific scenes;

Supervised learning for each labeled cluster to generate a detection model corresponding to a specific scene;

Detecting abnormal behavior based on the detection model.

According to a preferred embodiment of the present invention, the extracting a behavior feature set from a plurality of different types of data sources includes:

Extracting and analyzing subject behavior data from a variety of different types of data sources;

Aggregate, transform, and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.

According to a preferred embodiment of the present invention, the plurality of different types of data sources are log data from a plurality of systems.

According to a preferred embodiment of the present invention, the analysis subject includes a user or an enterprise server.

According to a preferred embodiment of the present invention, the aggregating, transforming, and extracting the extracted behavior data includes segmenting the extracted behavior data of the analysis subject according to a time window.

According to a preferred embodiment of the present invention, a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.

According to a preferred embodiment of the present invention, the time window includes a rolling window that divides the data stream by consecutive equal lengths.

According to a preferred embodiment of the present invention, the time window includes a session window that divides the data stream at time intervals.

According to a preferred embodiment of the present invention, the clustering the behavior feature set based on the unsupervised learning algorithm includes clustering behavior data in the behavior feature set according to a distance between the features.

According to a preferred embodiment of the present invention, the clustering the behavior feature set based on the unsupervised learning algorithm includes ranking a cluster of result clusters.

According to a preferred embodiment of the present invention, the clustering of the result clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit rate of external threat intelligence in the data.

According to a preferred embodiment of the present invention, the result clusters obtained by the clustering are analyzed, and clusters marked with abnormalities corresponding to specific scenes include specific security problems marked with abnormal clusters.

According to a preferred embodiment of the present invention, performing supervised learning on each labeled cluster and generating a detection model corresponding to a specific scene includes using a subset of data corresponding to each labeled cluster as positive data, and using other data in the data set as For negative data, a classification algorithm with supervised learning is used for training.

According to a preferred embodiment of the present invention, the detecting an abnormal behavior based on the detection model includes deploying the detection model to a production environment, and each detection model is used to detect a specific security problem.

According to another aspect of the present invention, an abnormal behavior detection system is provided, including the following modules:

Feature extraction module, used to extract behavior feature sets from multiple different types of data sources;

A clustering module, configured to cluster the behavior feature set based on an unsupervised learning algorithm;

Anomaly labeling module, for analyzing the clusters of results obtained by clustering, and marking clusters with abnormalities corresponding to specific scenes;

A learning module for supervised learning of each labeled cluster to generate a detection model corresponding to a specific scene;

A detection module detects abnormal behavior based on the detection model.

According to a preferred embodiment of the present invention, the feature extraction module includes:

Data extraction module, used to extract the behavioral data of the analysis subject from a variety of different types of data sources;

A data processing module is used to aggregate, transform and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.

According to a preferred embodiment of the present invention, the data processing module performs aggregation, transformation, and feature extraction on the extracted behavior data, and includes segmenting the extracted behavior data of the analysis subject according to a time window.

According to a preferred embodiment of the present invention, the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes clustering behavior data in the behavior feature set according to a distance between the features.

According to a preferred embodiment of the present invention, the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes ranking a cluster of result clusters.

According to a preferred embodiment of the present invention, the anomaly labeling module analyzes a cluster of results obtained by clustering, and a cluster marked with an abnormality corresponding to a specific scene includes marking an abnormal cluster with a specific security problem.

According to a preferred embodiment of the present invention, the learning module performs supervised learning on each labeled cluster, and generates a detection model corresponding to a specific scene, including using a subset of data corresponding to each labeled cluster as positive data, and collecting other data As negative data, the classification algorithm with supervised learning is used for training.

According to a preferred embodiment of the present invention, the detection module detecting the abnormal behavior based on the detection model includes deploying the detection model to a production environment, and each detection model is used to detect a specific security problem.

According to another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of the foregoing embodiments.

According to another aspect of the present invention, a computing device is provided, which includes a memory and a processor. The memory stores a computer program executable on the processor, and the processor executes the computer program to implement Steps of the abnormal behavior detection method according to any embodiment.

Compared with the prior art, the abnormal behavior detection method and system of the embodiments of the present invention extract behavior feature sets from different data sources, perform unsupervised learning clustering, and perform supervised learning on abnormal result clusters to generate corresponding specific application scenarios. Detection model with high adaptability and accuracy. Through these learning detection models, we can detect abnormal behavior of data sets in the production environment. The embodiments of the present invention can integrate behavior data of different data sources, can improve the accuracy of automatic recognition of abnormal behavior, and significantly reduce labor costs.

BRIEF DESCRIPTION OF THE DRAWINGS

1 is a schematic flowchart of an abnormal behavior detection method according to an embodiment of the present invention;

FIG. 2 is an exemplary diagram of a time window mechanism for behavior data processing according to an embodiment of the present invention; FIG.

FIG. 3 is a schematic structural diagram of an abnormal behavior detection system according to an embodiment of the present invention.

detailed description

In order to explain the technical solutions of the embodiments of the present invention more clearly, specific implementations of the present invention will be described below with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart of an abnormal behavior detection method according to an embodiment of the present invention. As shown in FIG. 1, the abnormal behavior detection method according to the embodiment of the present invention includes the following steps:

Step S11, extracting a behavior feature set from a plurality of different types of data sources.

First define the data source. The data source can be logs from multiple systems. The goal is to extract the analysis subject's behavior in that system from the logs of each system. An effective behavior model includes the following main components: analysis subject ID, time stamp, event name, specific behavior, object of behavior operation, event result, etc. Data from different data sources usually need to meet completeness, normativity, and uniformity, and each behavioral data can be connected to the analysis subject ID through an association relationship. Here, the analysis subject may be a user, and the corresponding analysis subject ID may be a user ID, or it may be an important asset within the enterprise, such as an external business service cluster, and the corresponding analysis subject ID may be a server IP address.

Taking the VPN data source log in Table 1 as an example, the fields corresponding to the behavior model are extracted. The analysis subject ID corresponds to 'user_id' or 'src_ip' or 'dst_ip', the time stamp corresponds to 'timestamp', the event name corresponds to 'activity', the object of the action operation corresponds to 'pc_name', and the result of the event corresponds to 'status'. If other fields cannot correspond to the behavior model, they can be discarded to save bandwidth or memory.

Table 1 Sample VPN data source logs

timestamptimestamp	src_ipsrc_ip	dst_ipdst_ip	user_iduser_id	activityactivity	pc_namepc_name	statusstatus
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z	111.163.192.68111.163.192.68	23.123.22.2223.123.22.22	Abcd12Abcd12	connectconnect	PC_1002PC_1002	successsuccess
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z	117.14.161.205117.14.161.205	23.123.22.2223.123.22.22	Abcd12Abcd12	connectconnect	PC_1021PC_1021	successsuccess
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z	117.14.161.207117.14.161.207	23.123.22.2223.123.22.22	Efgk21Efgk21	connectconnect	PC_2192PC_2192	failfail
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z	117.14.161.229117.14.161.229	23.123.22.2223.123.22.22	Hijk90Hijk90	disconnectdisconnect	PC_1202PC_1202	successsuccess

The analysis subject is an important key value of behavioral data, and all data structures are built on the analysis subject. The definition analysis subject can be defined from the log, and the field corresponding to the analysis subject ID in the log is used as the key value. At the same time, you can enrich and group by key. As shown in the VPN login log in Table 1 above, the account ID (user_id) field can be used to define the analysis subject, and the source IP (src_ip) or destination IP (dst_ip) can be used to define the analysis subject. Different definitions of the analysis subject will result in significantly different behavior characteristics, that is, the analysis problems are also different. If the account ID is selected as the subject, user behavior analysis is performed and aggregation is performed on the account unit basis. Corresponding ‘Abcd12’ login succeeds twice, from different source IPs and PCs. If the source IP is selected as the subject, four external IPs are logged in once. If the destination IP is used as the main body, the internal VPN server is analyzed and aggregated into four logins from different IPs.

A variety of different data sources extract data through a configurable data collector. Fields corresponding to the behavior model are extracted from different data sources to form a simplified log. Then, you can group according to different analysis subjects. Assume that E represents a defined analysis subject. The analysis subject E is first grouped, and the log corresponding to each analysis subject E is then segmented according to the time window W. After segmentation, each log subset S must be the same analysis subject and behavior data within the same time window. Each log subset S then performs field aggregation, transformation, and feature extraction according to the needs of the algorithm, and finally forms the behavior feature F of the analysis subject E. Finally, during a long period of analysis, assuming that the number of analysis subjects E is M and the number of time windows W is N, then this step will output about M * N behavior feature sets. The behavior feature F corresponds to generating logs in its log subset S. An example of the behavior feature F is shown in Table 2 below, where the required fields include the start time (start_time), end time (end_time) of the window, and the value corresponding to each feature.

Table 2 Examples of behavior characteristics

The definition of the time window is shown in Figure 2. Two different window mechanisms can be applied to the embodiments of the present invention. The first is a rolling window, that is, the data stream is divided into consecutive equal-length time windows, and the parameter is the window size window_size. The second is the session window, that is, the data flow is divided by time interval, and the parameter is time_interval. If the time interval between two consecutive events is less than time_interval, it is classified as a window. If the time interval between two consecutive events is greater than time_interval, the previous session window is ended, and the new event is divided into a new session window. The division of the time window of these two mechanisms is independent of the number of logs in a single window.

Step S12: Cluster the behavior feature set based on an unsupervised learning algorithm.

This step clusters the behavior feature set composed of all the behavior features F obtained in step S11. Clustering is one of the unsupervised learning algorithms. Unsupervised learning algorithms do not need to pre-label the data. The clustering algorithm clusters the behavior data in the behavior feature set according to the distance between the features. Similar behaviors correspond to similar features and are grouped together in a class. The feature clusters corresponding to different behaviors are far away and will be classified into unused classes. There are many different implementations of clustering algorithms, such as KMeans (K-means clustering), DBSCAN (density-based clustering with noise), Hierarchical Clustering (hierarchical clustering), and so on. Taking Kmeans (K-means clustering) as an example, the standard pseudo-code is as follows:

In a preferred embodiment, the clusters of clustering results will be scored and sorted according to the scored results. The ranking can be based on the performance evaluation index corresponding to the clustering algorithm, such as the cohesion of the corresponding class. The cohesion of the class is defined as (1-intra-class distance / inter-class distance), where the intra-class distance is the internal characteristics of the class. The average distance between two classes is the average distance between the internal features of the class and other features outside the class. Ranking can also introduce external threat intelligence, and use the hit rate of threat intelligence in the data as the threat index. The evaluation index pseudo code is as follows:

Step S13: Analyze the result clusters obtained by the clustering, and mark abnormal clusters corresponding to specific scenes.

In this step, a security expert or a business specialist may perform a security analysis or a business anomaly analysis on the result cluster obtained by the clustering. In a preferred embodiment, security analysis or business abnormality analysis may be performed from a cluster with a higher threat index to a cluster with a lower threat index.

Each cluster has a systematically generated analysis basis, including behavioral characteristics and raw data, for direct analysis. The auxiliary basis includes technical indicators such as the threat intelligence hit rate, the size of the cluster, the closeness of the behavior within the cluster, the closest subject in the cluster and the farthest subject. Security experts mainly judge from the perspective of whether the business corresponding to the behavior in the cluster is compliant and whether it meets the characteristics of known security models. An example of the analysis basis of a cluster is shown in Table 3 below:

Table 3 Examples of Analysis Basis

According to the results of cluster analysis, abnormal clusters are marked by security experts or business professionals. Mark as a specific security threat or business security issue. When a cluster is labeled, the subset of data corresponding to the behavior in the cluster is also given the same label.

In step S14, supervised learning is performed on each labeled cluster to generate a detection model corresponding to a specific scene.

In this step, for each labeled cluster, the corresponding data subset is used as the labeled positive data. The other data in the data set is regarded as negative data, and the classification algorithm with supervised learning is used for training. The trained model corresponds to the detection model of specific security problems in this data set. Because it is trained and derived from actual data, the detection model is fully adapted to the behavioral characteristics of the current data set, and has high detection accuracy for its detection of security issues. The supervised learning algorithm can choose XGBoost, GBDT, LightGBM, etc., which are widely recognized in the industry. The standard pseudo code is as follows:

Step S15: Detect abnormal behavior based on the detection model.

In this step, the obtained detection models are deployed first, and several trained detection models are deployed in the production system. The process of data collection and processing needs to be consistent with the process of training data collection and processing, and to ensure that the behavioral characteristics of the output are consistent. The behavior feature set is separately entered into different detection models for detection. Each detection model is used to detect specific security problems. Data identified as abnormal by the detection model will be encapsulated as alarms.

The abnormal behavior detection method according to the embodiment of the present invention performs unsupervised learning clustering on behavior feature sets extracted from different data sources, and performs supervised learning on abnormal result clusters to generate high adaptability and accuracy corresponding to specific application scenarios Detection models. These learning detection models are used to detect abnormal behavior of data sets in a production environment. This method can integrate behavior data from different data sources, improve the accuracy of automatic recognition of abnormal behavior, and significantly reduce labor costs.

FIG. 3 is a schematic structural diagram of an abnormal behavior detection system according to an embodiment of the present invention. As shown in FIG. 3, the abnormal behavior detection system according to the embodiment of the present invention includes the following functional modules:

A feature extraction module 21, configured to extract behavior feature sets from a plurality of different types of data sources;

A clustering module 22, configured to cluster the behavior feature set based on an unsupervised learning algorithm;

An abnormality labeling module 23, configured to analyze the clusters of results obtained by clustering, and mark abnormality clusters corresponding to specific scenes;

A learning module 24 for performing supervised learning on each labeled cluster to generate a detection model corresponding to a specific scene;

A detection module 25 is configured to detect an abnormal behavior based on the detection model.

Further, the feature extraction module further includes:

A data extraction module 211, configured to extract behavior data of an analysis subject from multiple different types of data sources;

A data processing module 212 is configured to aggregate, transform and feature extract the extracted behavior data to form a behavior feature set of the analysis subject.

In the embodiment of the present invention, different types of data sources may be logs from multiple systems. The goal is to extract the analysis subject's behavior in that system from the logs of each system. An effective behavior model includes the following main components: analysis subject ID, time stamp, event name, specific behavior, object of behavior operation, event result, etc. Data from different data sources usually need to meet completeness, normativity, and uniformity, and each behavioral data can be connected to the analysis subject ID through an association relationship. Here, the analysis subject may be a user, and the corresponding analysis subject ID may be a user ID, or it may be an important asset within the enterprise, such as an external business service cluster, and the corresponding analysis subject ID may be a server IP address.

The feature extraction module 21 extracts data from a variety of different data sources through a configurable data collector, and extracts fields corresponding to the behavior model from different data sources to form a simplified log. Then, you can group according to different analysis subjects. Assume that E represents a defined analysis subject. The analysis subject E is first grouped, and the log corresponding to each analysis subject E is then segmented according to the time window W. After segmentation, each log subset S must be the same analysis subject and behavior data within the same time window. Each log subset S then performs field aggregation, transformation, and feature extraction according to the needs of the algorithm, and finally forms the behavior feature F of the analysis subject E. Finally, in a long-period analysis process, assuming that the number of analysis subjects E is M and the number of time windows W is N, then eventually M * N behavior feature sets will be output.

The clustering module 22 clusters the behavior feature set obtained by the feature extraction module 21. Clustering is one of the unsupervised learning algorithms. Unsupervised learning algorithms do not need to pre-label the data. The clustering algorithm clusters the behavior data in the behavior feature set according to the distance between the features. Similar behaviors correspond to similar features and are grouped together in a class. The feature clusters corresponding to different behaviors are far away and will be classified into unused classes. There are many different implementations of clustering algorithms, such as KMeans (K-means clustering), DBSCAN (density-based clustering with noise), Hierarchical Clustering (hierarchical clustering), and so on.

In a preferred embodiment, the result clusters of the clustering can also be scored and sorted according to the scored results. The ranking can be based on the performance evaluation index corresponding to the clustering algorithm, such as the cohesion of the corresponding class. The cohesion of the class is defined as (1-intra-class distance / inter-class distance), where the intra-class distance is the internal characteristics of the class. The average distance between two classes is the average distance between the internal features of the class and other features outside the class. Ranking can also introduce external threat intelligence, and use the hit rate of threat intelligence in the data as the threat index.

The abnormality labeling module 23 is configured to perform security analysis or business abnormality analysis on the result cluster obtained by the clustering by a security expert or a business specialist. In a preferred embodiment, security analysis or business abnormality analysis may be performed from a cluster with a higher threat index to a cluster with a lower threat index.

Each cluster has a systematically generated analysis basis, including behavioral characteristics and raw data, for direct analysis. The auxiliary basis includes technical indicators such as the threat intelligence hit rate, the size of the cluster, the closeness of the behavior within the cluster, the closest subject in the cluster and the farthest subject. Security experts mainly judge from the perspective of whether the business corresponding to the behavior in the cluster is compliant and whether it meets the characteristics of known security models.

The learning module 24 applies a supervised learning classification algorithm to train each labeled cluster. The trained model corresponds to the detection model of specific security problems in this data set. Because it is trained and derived from actual data, the detection model is fully adapted to the behavioral characteristics of the current data set, and has high detection accuracy for its detection of security issues. For supervised learning algorithms, XGBoost, GBDT, LightGBM, etc., which are recognized by the industry as effective, can be used.

The detection module 25 detects abnormal behavior based on the detection model. First, the obtained detection model is deployed, and several trained detection models are deployed in the production system. The process of data collection and processing needs to be consistent with the process of training data collection and processing, to ensure that the behavioral characteristics of the output are consistent. The behavior feature set is separately entered into different detection models for detection. Each detection model is used to detect specific security problems. Data identified as abnormal by the detection model will be encapsulated as alarms.

The abnormal behavior detection system according to the embodiment of the present invention performs unsupervised learning clustering on behavior feature sets extracted from different data sources, and performs supervised learning on abnormal result clusters to generate high adaptability and accuracy corresponding to specific application scenarios. Detection models. These learning detection models are used to detect abnormal behavior of data sets in a production environment. The system can integrate behavioral data from different data sources, improve the accuracy of automatic identification of abnormal behaviors, and significantly reduce labor costs.

According to another embodiment of the present invention, there is also provided a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of the foregoing embodiments.

According to another embodiment of the present invention, a computing device is further provided, which includes a memory and a processor. The memory stores a computer program executable on the processor, and the processor executes the computer program to implement Steps of the abnormal behavior detection method according to any one of the foregoing embodiments.

The specific embodiments described above further describe the objectives, technical solutions, and beneficial effects of the present invention in further detail. It should be noted that the above is only a preferred embodiment of the present invention, and it should be noted that for those of ordinary skill in the art, without departing from the principles of the present invention, several changes and improvements can be made. These changes and improvements should also be regarded as falling within the protection scope of the present invention.

Claims

A method for detecting abnormal behavior, comprising the following steps:

Extract behavioral feature sets from many different types of data sources;

Clustering the behavior feature set based on an unsupervised learning algorithm;

Analyze the result clusters obtained by clustering, and mark abnormal clusters corresponding to specific scenes;

Supervised learning for each labeled cluster to generate a detection model corresponding to a specific scene;

Detecting abnormal behavior based on the detection model.
The abnormal behavior detection method according to claim 1, wherein the extracting behavior feature sets from a plurality of different types of data sources comprises:

Extracting and analyzing subject behavior data from a variety of different types of data sources;

Aggregate, transform, and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
The abnormal behavior detection method according to claim 2, wherein the plurality of different types of data sources are log data from a plurality of systems.
The abnormal behavior detection method according to claim 3, wherein the analysis subject comprises a user or an enterprise server.
The abnormal behavior detection method according to claim 4, wherein the aggregating, transforming, and extracting the extracted behavior data comprises segmenting the extracted behavior data of the analysis subject according to a time window.
The abnormal behavior detection method according to claim 5, wherein a subset of the logs of the same analysis subject within the same time window correspondingly generates a behavior feature.
The abnormal behavior detection method according to claim 5, wherein the time window comprises a rolling window that divides the data stream by continuous equal length.
The abnormal behavior detection method according to claim 5, wherein the time window comprises a session window for segmenting the data stream at time intervals.
The abnormal behavior detection method according to claim 1, wherein the clustering the behavior feature set based on an unsupervised learning algorithm comprises clustering behavior data in the behavior feature set according to a distance between the features.
The abnormal behavior detection method according to claim 9, wherein the clustering the behavior feature set based on an unsupervised learning algorithm comprises sorting a cluster of result clusters.
The abnormal behavior detection method according to claim 10, wherein the sorting of the result clusters of the clusters is based on an effect evaluation index corresponding to the clustering algorithm or a hit ratio of external threat intelligence in the data.
The abnormal behavior detection method according to claim 1, wherein the analysis of the result clusters obtained by the clustering, and the clusters marked with abnormalities corresponding to the specific scenes include the specific security problems marked with the abnormal clusters.
The abnormal behavior detection method according to claim 12, wherein performing supervised learning on each labeled cluster, and generating a detection model corresponding to a specific scene includes using a subset of data corresponding to each labeled cluster as a positive Data, the other data in the data set is regarded as negative data, and the classification algorithm with supervised learning is used for training.
The abnormal behavior detection method according to claim 1, wherein detecting the abnormal behavior based on the detection model comprises deploying the detection model to a production environment, and each detection model is used to perform a specific security problem. Detection.
An abnormal behavior detection system includes the following modules:

Feature extraction module, used to extract behavior feature sets from multiple different types of data sources;

A clustering module, configured to cluster the behavior feature set based on an unsupervised learning algorithm;

Anomaly labeling module, for analyzing the clusters of results obtained by clustering, and marking clusters with abnormalities corresponding to specific scenes;

A learning module for supervised learning of each labeled cluster to generate a detection model corresponding to a specific scene;

A detection module is configured to detect abnormal behavior based on the detection model.
The abnormal behavior detection system according to claim 15, wherein the feature extraction module comprises:

Data extraction module, used to extract the behavioral data of the analysis subject from a variety of different types of data sources;

A data processing module is used to aggregate, transform and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
The abnormal behavior detection system according to claim 16, wherein the plurality of different types of data sources are log data from a plurality of systems.
The abnormal behavior detection system according to claim 17, wherein the analysis subject comprises a user or an enterprise server.
The abnormal behavior detection system according to claim 18, wherein the data processing module performs aggregation, transformation, and feature extraction on the extracted behavior data, and comprises segmenting the behavior data of the extracted analysis subject according to a time window.
The abnormal behavior detection system according to claim 19, wherein a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.
The abnormal behavior detection system according to claim 19, wherein the time window comprises a rolling window that divides the data stream by continuous equal length.
The abnormal behavior detection system according to claim 19, wherein the time window comprises a session window for dividing a data stream at time intervals.
The abnormal behavior detection system according to claim 15, wherein the clustering module clusters the behavior feature set based on an unsupervised learning algorithm, and includes performing behavior data in the behavior feature set according to a distance between the features. Clustering.
The abnormal behavior detection system according to claim 23, wherein the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes sorting a cluster of result clusters.
The abnormal behavior detection system according to claim 24, wherein the sorting of the result clusters of the clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit ratio of external threat intelligence in the data.
The abnormal behavior detection system according to claim 15, wherein the abnormal labeling module analyzes the clusters of results obtained by clustering, and clusters marked with abnormalities corresponding to specific scenes include specific security problems marked with abnormal clusters.
The abnormal behavior detection system according to claim 26, wherein the learning module performs supervised learning on each labeled cluster, and generating a detection model corresponding to a specific scene includes a subset of data corresponding to each labeled cluster As positive data, the other data in the data set is used as negative data, and a classification algorithm with supervised learning is used for training.
The abnormal behavior detection system according to claim 15, wherein the detecting module detects the abnormal behavior based on the detection model, and comprises deploying the detection model to a production environment, and each detection model is used for specific security Detect problems.
A computer-readable storage medium, characterized in that a computer program is stored thereon, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of claims 1-14.
A computing device, characterized in that it includes a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and the processor executes the computer program to implement any of claims 1-14 Steps of an abnormal behavior detection method according to one item.