WO2020038353A1 - Abnormal behavior detection method and system - Google Patents

Abnormal behavior detection method and system Download PDF

Info

Publication number
WO2020038353A1
WO2020038353A1 PCT/CN2019/101543 CN2019101543W WO2020038353A1 WO 2020038353 A1 WO2020038353 A1 WO 2020038353A1 CN 2019101543 W CN2019101543 W CN 2019101543W WO 2020038353 A1 WO2020038353 A1 WO 2020038353A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
behavior
abnormal behavior
abnormal
clustering
Prior art date
Application number
PCT/CN2019/101543
Other languages
French (fr)
Chinese (zh)
Inventor
李达
吴睿
万晓川
Original Assignee
瀚思安信(北京)软件技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瀚思安信(北京)软件技术有限公司 filed Critical 瀚思安信(北京)软件技术有限公司
Publication of WO2020038353A1 publication Critical patent/WO2020038353A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the field of information security, and in particular, to a method and system for detecting abnormal behaviors, used to detect abnormal behaviors in an enterprise information system, and to protect the services, data, services, and assets of an enterprise.
  • An existing method is to train a classifier through training sample learning to distinguish abnormal behavior data from normal behavior data, but this method needs to manually label a large amount of sample data, especially with different data sources in the enterprise security architecture
  • the amount of security data is constantly increasing, and the labeling of massive data from different data sources requires a lot of labor costs, which cannot meet the actual needs of current enterprise information security analysis.
  • the technical problem to be solved by the present invention is how to provide an abnormal behavior detection method and system capable of integrating data from different data sources, which can improve the accuracy of automatic identification of abnormal behavior and reduce labor costs.
  • a method for detecting abnormal behavior which includes the following steps:
  • the extracting a behavior feature set from a plurality of different types of data sources includes:
  • the plurality of different types of data sources are log data from a plurality of systems.
  • the analysis subject includes a user or an enterprise server.
  • the aggregating, transforming, and extracting the extracted behavior data includes segmenting the extracted behavior data of the analysis subject according to a time window.
  • a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.
  • the time window includes a rolling window that divides the data stream by consecutive equal lengths.
  • the time window includes a session window that divides the data stream at time intervals.
  • the clustering the behavior feature set based on the unsupervised learning algorithm includes clustering behavior data in the behavior feature set according to a distance between the features.
  • the clustering the behavior feature set based on the unsupervised learning algorithm includes ranking a cluster of result clusters.
  • the clustering of the result clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit rate of external threat intelligence in the data.
  • the result clusters obtained by the clustering are analyzed, and clusters marked with abnormalities corresponding to specific scenes include specific security problems marked with abnormal clusters.
  • performing supervised learning on each labeled cluster and generating a detection model corresponding to a specific scene includes using a subset of data corresponding to each labeled cluster as positive data, and using other data in the data set as For negative data, a classification algorithm with supervised learning is used for training.
  • the detecting an abnormal behavior based on the detection model includes deploying the detection model to a production environment, and each detection model is used to detect a specific security problem.
  • an abnormal behavior detection system including the following modules:
  • Feature extraction module used to extract behavior feature sets from multiple different types of data sources
  • a clustering module configured to cluster the behavior feature set based on an unsupervised learning algorithm
  • Anomaly labeling module for analyzing the clusters of results obtained by clustering, and marking clusters with abnormalities corresponding to specific scenes
  • a learning module for supervised learning of each labeled cluster to generate a detection model corresponding to a specific scene
  • a detection module detects abnormal behavior based on the detection model.
  • the feature extraction module includes:
  • Data extraction module used to extract the behavioral data of the analysis subject from a variety of different types of data sources
  • a data processing module is used to aggregate, transform and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
  • the plurality of different types of data sources are log data from a plurality of systems.
  • the analysis subject includes a user or an enterprise server.
  • the data processing module performs aggregation, transformation, and feature extraction on the extracted behavior data, and includes segmenting the extracted behavior data of the analysis subject according to a time window.
  • a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.
  • the time window includes a rolling window that divides the data stream by consecutive equal lengths.
  • the time window includes a session window that divides the data stream at time intervals.
  • the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes clustering behavior data in the behavior feature set according to a distance between the features.
  • the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes ranking a cluster of result clusters.
  • the clustering of the result clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit rate of external threat intelligence in the data.
  • the anomaly labeling module analyzes a cluster of results obtained by clustering, and a cluster marked with an abnormality corresponding to a specific scene includes marking an abnormal cluster with a specific security problem.
  • the learning module performs supervised learning on each labeled cluster, and generates a detection model corresponding to a specific scene, including using a subset of data corresponding to each labeled cluster as positive data, and collecting other data As negative data, the classification algorithm with supervised learning is used for training.
  • the detection module detecting the abnormal behavior based on the detection model includes deploying the detection model to a production environment, and each detection model is used to detect a specific security problem.
  • a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of the foregoing embodiments.
  • a computing device which includes a memory and a processor.
  • the memory stores a computer program executable on the processor, and the processor executes the computer program to implement Steps of the abnormal behavior detection method according to any embodiment.
  • the abnormal behavior detection method and system of the embodiments of the present invention extract behavior feature sets from different data sources, perform unsupervised learning clustering, and perform supervised learning on abnormal result clusters to generate corresponding specific application scenarios. Detection model with high adaptability and accuracy. Through these learning detection models, we can detect abnormal behavior of data sets in the production environment.
  • the embodiments of the present invention can integrate behavior data of different data sources, can improve the accuracy of automatic recognition of abnormal behavior, and significantly reduce labor costs.
  • FIG. 1 is a schematic flowchart of an abnormal behavior detection method according to an embodiment of the present invention.
  • FIG. 2 is an exemplary diagram of a time window mechanism for behavior data processing according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an abnormal behavior detection system according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of an abnormal behavior detection method according to an embodiment of the present invention. As shown in FIG. 1, the abnormal behavior detection method according to the embodiment of the present invention includes the following steps:
  • Step S11 extracting a behavior feature set from a plurality of different types of data sources.
  • the data source can be logs from multiple systems.
  • the goal is to extract the analysis subject's behavior in that system from the logs of each system.
  • An effective behavior model includes the following main components: analysis subject ID, time stamp, event name, specific behavior, object of behavior operation, event result, etc. Data from different data sources usually need to meet completeness, normativity, and uniformity, and each behavioral data can be connected to the analysis subject ID through an association relationship.
  • the analysis subject may be a user
  • the corresponding analysis subject ID may be a user ID, or it may be an important asset within the enterprise, such as an external business service cluster
  • the corresponding analysis subject ID may be a server IP address.
  • the fields corresponding to the behavior model are extracted.
  • the analysis subject ID corresponds to 'user_id' or 'src_ip' or 'dst_ip'
  • the time stamp corresponds to 'timestamp'
  • the event name corresponds to 'activity'
  • the object of the action operation corresponds to 'pc_name'
  • the result of the event corresponds to 'status'. If other fields cannot correspond to the behavior model, they can be discarded to save bandwidth or memory.
  • the analysis subject is an important key value of behavioral data, and all data structures are built on the analysis subject.
  • the definition analysis subject can be defined from the log, and the field corresponding to the analysis subject ID in the log is used as the key value. At the same time, you can enrich and group by key.
  • the account ID (user_id) field can be used to define the analysis subject, and the source IP (src_ip) or destination IP (dst_ip) can be used to define the analysis subject.
  • Different definitions of the analysis subject will result in significantly different behavior characteristics, that is, the analysis problems are also different. If the account ID is selected as the subject, user behavior analysis is performed and aggregation is performed on the account unit basis.
  • Corresponding ‘Abcd12’ login succeeds twice, from different source IPs and PCs. If the source IP is selected as the subject, four external IPs are logged in once. If the destination IP is used as the main body, the internal VPN server is analyzed and aggregated into four logins from different IPs.
  • a variety of different data sources extract data through a configurable data collector. Fields corresponding to the behavior model are extracted from different data sources to form a simplified log. Then, you can group according to different analysis subjects. Assume that E represents a defined analysis subject. The analysis subject E is first grouped, and the log corresponding to each analysis subject E is then segmented according to the time window W. After segmentation, each log subset S must be the same analysis subject and behavior data within the same time window. Each log subset S then performs field aggregation, transformation, and feature extraction according to the needs of the algorithm, and finally forms the behavior feature F of the analysis subject E.
  • the behavior feature F corresponds to generating logs in its log subset S.
  • An example of the behavior feature F is shown in Table 2 below, where the required fields include the start time (start_time), end time (end_time) of the window, and the value corresponding to each feature.
  • the definition of the time window is shown in Figure 2.
  • Two different window mechanisms can be applied to the embodiments of the present invention.
  • the first is a rolling window, that is, the data stream is divided into consecutive equal-length time windows, and the parameter is the window size window_size.
  • the second is the session window, that is, the data flow is divided by time interval, and the parameter is time_interval. If the time interval between two consecutive events is less than time_interval, it is classified as a window. If the time interval between two consecutive events is greater than time_interval, the previous session window is ended, and the new event is divided into a new session window.
  • the division of the time window of these two mechanisms is independent of the number of logs in a single window.
  • Step S12 Cluster the behavior feature set based on an unsupervised learning algorithm.
  • This step clusters the behavior feature set composed of all the behavior features F obtained in step S11.
  • Clustering is one of the unsupervised learning algorithms. Unsupervised learning algorithms do not need to pre-label the data.
  • the clustering algorithm clusters the behavior data in the behavior feature set according to the distance between the features. Similar behaviors correspond to similar features and are grouped together in a class. The feature clusters corresponding to different behaviors are far away and will be classified into unused classes.
  • clustering algorithms such as KMeans (K-means clustering), DBSCAN (density-based clustering with noise), Hierarchical Clustering (hierarchical clustering), and so on. Taking Kmeans (K-means clustering) as an example, the standard pseudo-code is as follows:
  • the clusters of clustering results will be scored and sorted according to the scored results.
  • the ranking can be based on the performance evaluation index corresponding to the clustering algorithm, such as the cohesion of the corresponding class.
  • the cohesion of the class is defined as (1-intra-class distance / inter-class distance), where the intra-class distance is the internal characteristics of the class.
  • the average distance between two classes is the average distance between the internal features of the class and other features outside the class.
  • Ranking can also introduce external threat intelligence, and use the hit rate of threat intelligence in the data as the threat index.
  • the evaluation index pseudo code is as follows:
  • Step S13 Analyze the result clusters obtained by the clustering, and mark abnormal clusters corresponding to specific scenes.
  • a security expert or a business specialist may perform a security analysis or a business anomaly analysis on the result cluster obtained by the clustering.
  • security analysis or business abnormality analysis may be performed from a cluster with a higher threat index to a cluster with a lower threat index.
  • Each cluster has a systematically generated analysis basis, including behavioral characteristics and raw data, for direct analysis.
  • the auxiliary basis includes technical indicators such as the threat intelligence hit rate, the size of the cluster, the closeness of the behavior within the cluster, the closest subject in the cluster and the farthest subject.
  • Security experts mainly judge from the perspective of whether the business corresponding to the behavior in the cluster is compliant and whether it meets the characteristics of known security models.
  • An example of the analysis basis of a cluster is shown in Table 3 below:
  • abnormal clusters are marked by security experts or business professionals. Mark as a specific security threat or business security issue.
  • the subset of data corresponding to the behavior in the cluster is also given the same label.
  • step S14 supervised learning is performed on each labeled cluster to generate a detection model corresponding to a specific scene.
  • the corresponding data subset is used as the labeled positive data.
  • the other data in the data set is regarded as negative data, and the classification algorithm with supervised learning is used for training.
  • the trained model corresponds to the detection model of specific security problems in this data set. Because it is trained and derived from actual data, the detection model is fully adapted to the behavioral characteristics of the current data set, and has high detection accuracy for its detection of security issues.
  • the supervised learning algorithm can choose XGBoost, GBDT, LightGBM, etc., which are widely recognized in the industry.
  • the standard pseudo code is as follows:
  • Step S15 Detect abnormal behavior based on the detection model.
  • the obtained detection models are deployed first, and several trained detection models are deployed in the production system.
  • the process of data collection and processing needs to be consistent with the process of training data collection and processing, and to ensure that the behavioral characteristics of the output are consistent.
  • the behavior feature set is separately entered into different detection models for detection. Each detection model is used to detect specific security problems. Data identified as abnormal by the detection model will be encapsulated as alarms.
  • the abnormal behavior detection method performs unsupervised learning clustering on behavior feature sets extracted from different data sources, and performs supervised learning on abnormal result clusters to generate high adaptability and accuracy corresponding to specific application scenarios Detection models. These learning detection models are used to detect abnormal behavior of data sets in a production environment. This method can integrate behavior data from different data sources, improve the accuracy of automatic recognition of abnormal behavior, and significantly reduce labor costs.
  • FIG. 3 is a schematic structural diagram of an abnormal behavior detection system according to an embodiment of the present invention. As shown in FIG. 3, the abnormal behavior detection system according to the embodiment of the present invention includes the following functional modules:
  • a feature extraction module 21 configured to extract behavior feature sets from a plurality of different types of data sources
  • a clustering module 22 configured to cluster the behavior feature set based on an unsupervised learning algorithm
  • An abnormality labeling module 23 configured to analyze the clusters of results obtained by clustering, and mark abnormality clusters corresponding to specific scenes;
  • a learning module 24 for performing supervised learning on each labeled cluster to generate a detection model corresponding to a specific scene
  • a detection module 25 is configured to detect an abnormal behavior based on the detection model.
  • the feature extraction module further includes:
  • a data extraction module 211 configured to extract behavior data of an analysis subject from multiple different types of data sources
  • a data processing module 212 is configured to aggregate, transform and feature extract the extracted behavior data to form a behavior feature set of the analysis subject.
  • different types of data sources may be logs from multiple systems.
  • the goal is to extract the analysis subject's behavior in that system from the logs of each system.
  • An effective behavior model includes the following main components: analysis subject ID, time stamp, event name, specific behavior, object of behavior operation, event result, etc.
  • Data from different data sources usually need to meet completeness, normativity, and uniformity, and each behavioral data can be connected to the analysis subject ID through an association relationship.
  • the analysis subject may be a user
  • the corresponding analysis subject ID may be a user ID, or it may be an important asset within the enterprise, such as an external business service cluster, and the corresponding analysis subject ID may be a server IP address.
  • the feature extraction module 21 extracts data from a variety of different data sources through a configurable data collector, and extracts fields corresponding to the behavior model from different data sources to form a simplified log. Then, you can group according to different analysis subjects. Assume that E represents a defined analysis subject. The analysis subject E is first grouped, and the log corresponding to each analysis subject E is then segmented according to the time window W. After segmentation, each log subset S must be the same analysis subject and behavior data within the same time window. Each log subset S then performs field aggregation, transformation, and feature extraction according to the needs of the algorithm, and finally forms the behavior feature F of the analysis subject E. Finally, in a long-period analysis process, assuming that the number of analysis subjects E is M and the number of time windows W is N, then eventually M * N behavior feature sets will be output.
  • the clustering module 22 clusters the behavior feature set obtained by the feature extraction module 21.
  • Clustering is one of the unsupervised learning algorithms. Unsupervised learning algorithms do not need to pre-label the data.
  • the clustering algorithm clusters the behavior data in the behavior feature set according to the distance between the features. Similar behaviors correspond to similar features and are grouped together in a class. The feature clusters corresponding to different behaviors are far away and will be classified into unused classes.
  • clustering algorithms such as KMeans (K-means clustering), DBSCAN (density-based clustering with noise), Hierarchical Clustering (hierarchical clustering), and so on.
  • the result clusters of the clustering can also be scored and sorted according to the scored results.
  • the ranking can be based on the performance evaluation index corresponding to the clustering algorithm, such as the cohesion of the corresponding class.
  • the cohesion of the class is defined as (1-intra-class distance / inter-class distance), where the intra-class distance is the internal characteristics of the class.
  • the average distance between two classes is the average distance between the internal features of the class and other features outside the class.
  • Ranking can also introduce external threat intelligence, and use the hit rate of threat intelligence in the data as the threat index.
  • the abnormality labeling module 23 is configured to perform security analysis or business abnormality analysis on the result cluster obtained by the clustering by a security expert or a business specialist.
  • security analysis or business abnormality analysis may be performed from a cluster with a higher threat index to a cluster with a lower threat index.
  • Each cluster has a systematically generated analysis basis, including behavioral characteristics and raw data, for direct analysis.
  • the auxiliary basis includes technical indicators such as the threat intelligence hit rate, the size of the cluster, the closeness of the behavior within the cluster, the closest subject in the cluster and the farthest subject.
  • Security experts mainly judge from the perspective of whether the business corresponding to the behavior in the cluster is compliant and whether it meets the characteristics of known security models.
  • abnormal clusters are marked by security experts or business professionals. Mark as a specific security threat or business security issue.
  • the subset of data corresponding to the behavior in the cluster is also given the same label.
  • the learning module 24 applies a supervised learning classification algorithm to train each labeled cluster.
  • the trained model corresponds to the detection model of specific security problems in this data set. Because it is trained and derived from actual data, the detection model is fully adapted to the behavioral characteristics of the current data set, and has high detection accuracy for its detection of security issues.
  • supervised learning algorithms XGBoost, GBDT, LightGBM, etc., which are recognized by the industry as effective, can be used.
  • the detection module 25 detects abnormal behavior based on the detection model.
  • the obtained detection model is deployed, and several trained detection models are deployed in the production system.
  • the process of data collection and processing needs to be consistent with the process of training data collection and processing, to ensure that the behavioral characteristics of the output are consistent.
  • the behavior feature set is separately entered into different detection models for detection. Each detection model is used to detect specific security problems. Data identified as abnormal by the detection model will be encapsulated as alarms.
  • the abnormal behavior detection system performs unsupervised learning clustering on behavior feature sets extracted from different data sources, and performs supervised learning on abnormal result clusters to generate high adaptability and accuracy corresponding to specific application scenarios. Detection models. These learning detection models are used to detect abnormal behavior of data sets in a production environment.
  • the system can integrate behavioral data from different data sources, improve the accuracy of automatic identification of abnormal behaviors, and significantly reduce labor costs.
  • a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of the foregoing embodiments.
  • a computing device which includes a memory and a processor.
  • the memory stores a computer program executable on the processor, and the processor executes the computer program to implement Steps of the abnormal behavior detection method according to any one of the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are an abnormal behavior detection method and system. Unsupervised learning clustering is carried out on behavior feature sets extracted from different data sources, a result cluster obtained by means of clustering is analyzed to mark abnormal clusters, supervised learning is carried out on the abnormal clusters to generate a detection model having a high degree of adaptation and accuracy and corresponding to a specific application scenario, and abnormal behavior detection of a data set in a production environment is realized by means of the detection model obtained through said learning. The embodiments of the present invention can integrate behavior data from different data sources, and can improve the accuracy of automatic recognition of abnormal behaviors and significantly reduce the labor cost.

Description

异常行为检测方法及系统Method and system for detecting abnormal behavior 技术领域Technical field
本发明涉及信息安全领域,具体而言,涉及一种异常行为检测方法及系统,用于检测企业信息系统内的异常行为,保护企业的服务、数据、业务、资产安全。The present invention relates to the field of information security, and in particular, to a method and system for detecting abnormal behaviors, used to detect abnormal behaviors in an enterprise information system, and to protect the services, data, services, and assets of an enterprise.
背景技术Background technique
当前信息安全领域,正在面临多种挑战。一方面,企业安全架构日趋复杂,各种类型的安全设备、安全数据越来越多,传统的分析能力明显力不从心;另一方面,随着以高级可持续性威胁(APT)和内部人员攻击为代表的新型威胁的兴起,内控与合规的深入,越来越需要储存与分析更多的安全信息,并且更加快速地做出判定和响应。At present, the field of information security is facing various challenges. On the one hand, the security architecture of enterprises is becoming more and more complex, and various types of security equipment and security data are increasing. Traditional analysis capabilities are obviously inadequate. The emergence of a new type of threat, the deepening of internal control and compliance, more and more need to store and analyze more security information, and make decisions and respond more quickly.
过去,了解难以察觉的安全威胁会耗费数天甚至数月的时间,因为大量的互不相干的数据流难以形成简明、有条理的事件“拼图”。所采集和分析的数据量越大,看起来越混乱,重构事件所需的时间也越长,这通常需要专业人员借助专业知识进行大量的人工分析,费时费力。而随着机器学习和分类算法的研究和应用,业内开始研究如何利用机器学习和分类算法替代人工分析,实现机器对异常行为的自动分析和检测。现有的一种方法是通过训练样本的学习训练分类器以区分异常行为数据和正常行为数据,但是这种方法需要对大量的样本数据进行人工标注,尤其是随着企业安全架构中不同数据源的安全数据不断增多,对于不同数据源的海量数据进行标注需要耗费大量的人工成本,无法满足当前企业信息安全分析的实际需求。In the past, it took days or even months to understand hard-to-detect security threats, because a large number of disparate data streams made it difficult to form concise and organized event “puzzles”. The larger the amount of data collected and analyzed, the more chaotic it looks and the longer it takes to reconstruct the event. This usually requires professionals to perform a lot of manual analysis with the help of professional knowledge, which is time-consuming and labor-intensive. With the research and application of machine learning and classification algorithms, the industry has begun to study how to use machine learning and classification algorithms to replace manual analysis to achieve automatic analysis and detection of abnormal behavior by machines. An existing method is to train a classifier through training sample learning to distinguish abnormal behavior data from normal behavior data, but this method needs to manually label a large amount of sample data, especially with different data sources in the enterprise security architecture The amount of security data is constantly increasing, and the labeling of massive data from different data sources requires a lot of labor costs, which cannot meet the actual needs of current enterprise information security analysis.
因此,如何提供一套整合不同数据源的数据,应用最先进的机器学习算法,人工智能检测引擎,整合安全专家的知识来自动辨别异常行为,自动整合众多异常单元,形成运维人员能够理解和解释的异常场景,就成为拥有大型信息系统的企业的迫切需求。Therefore, how to provide a set of data that integrates different data sources, apply the most advanced machine learning algorithms, artificial intelligence detection engines, integrate the knowledge of security experts to automatically identify abnormal behaviors, and automatically integrate many abnormal units, so that operations and maintenance personnel can understand and The anomalous scenario explained has become an urgent need for enterprises with large information systems.
发明内容Summary of the Invention
本发明所要解决的技术问题是如何提供一种能够整合不同数据源的数据的异常行为检测方法及系统,能够提高对异常行为的自动识别的准确性,降低人工成本。The technical problem to be solved by the present invention is how to provide an abnormal behavior detection method and system capable of integrating data from different data sources, which can improve the accuracy of automatic identification of abnormal behavior and reduce labor costs.
为解决上述技术问题,根据本发明的一方面,提出一种异常行为检测方法,包括以下步骤:In order to solve the above technical problem, according to an aspect of the present invention, a method for detecting abnormal behavior is provided, which includes the following steps:
从多种不同类型的数据源提取行为特征集;Extract behavioral feature sets from many different types of data sources;
基于无监督学习算法对所述行为特征集进行聚类;Clustering the behavior feature set based on an unsupervised learning algorithm;
对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇;Analyze the result clusters obtained by clustering, and mark abnormal clusters corresponding to specific scenes;
对每个标记的簇进行有监督学习,生成对应具体场景的检测模型;Supervised learning for each labeled cluster to generate a detection model corresponding to a specific scene;
基于所述检测模型对异常行为进行检测。Detecting abnormal behavior based on the detection model.
根据本发明的优选实施方式,所述从多种不同类型的数据源提取行为特征集包括:According to a preferred embodiment of the present invention, the extracting a behavior feature set from a plurality of different types of data sources includes:
从多种不同类型的数据源提取分析主体的行为数据;Extracting and analyzing subject behavior data from a variety of different types of data sources;
对提取的行为数据进行聚合、变换和特征抽取,形成分析主体的行为特征集。Aggregate, transform, and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
根据本发明的优选实施方式,所述多种不同类型的数据源是来自多个系统的日志数据。According to a preferred embodiment of the present invention, the plurality of different types of data sources are log data from a plurality of systems.
根据本发明的优选实施方式,所述分析主体包括用户或者企业服务器。According to a preferred embodiment of the present invention, the analysis subject includes a user or an enterprise server.
根据本发明的优选实施方式,所述对提取的行为数据进行聚合、变换和特征抽取包括对提取的分析主体的行为数据根据时间窗口进行切分。According to a preferred embodiment of the present invention, the aggregating, transforming, and extracting the extracted behavior data includes segmenting the extracted behavior data of the analysis subject according to a time window.
根据本发明的优选实施方式,同一分析主体在同一时间窗口内的日志子集对应生成一条行为特征。According to a preferred embodiment of the present invention, a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.
根据本发明的优选实施方式,所述时间窗口包括对数据流按连续等长进行分割的滚动窗口。According to a preferred embodiment of the present invention, the time window includes a rolling window that divides the data stream by consecutive equal lengths.
根据本发明的优选实施方式,所述时间窗口包括对数据流按时间间隔进行分割的会话窗口。According to a preferred embodiment of the present invention, the time window includes a session window that divides the data stream at time intervals.
根据本发明的优选实施方式,所述基于无监督学习算法对所述行为特征集进行聚类包括将行为特征集中的行为数据按照其特征间的距离进行聚类。According to a preferred embodiment of the present invention, the clustering the behavior feature set based on the unsupervised learning algorithm includes clustering behavior data in the behavior feature set according to a distance between the features.
根据本发明的优选实施方式,所述基于无监督学习算法对所述行为特征集进行聚类包括对聚类的结果簇进行排序。According to a preferred embodiment of the present invention, the clustering the behavior feature set based on the unsupervised learning algorithm includes ranking a cluster of result clusters.
根据本发明的优选实施方式,所述对聚类的结果簇进行排序依据聚类算法对应的效果评价指数或者外部威胁情报在数据中的命中率。According to a preferred embodiment of the present invention, the clustering of the result clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit rate of external threat intelligence in the data.
根据本发明的优选实施方式,所述对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇包括对异常的簇标记具体的安全问题。According to a preferred embodiment of the present invention, the result clusters obtained by the clustering are analyzed, and clusters marked with abnormalities corresponding to specific scenes include specific security problems marked with abnormal clusters.
根据本发明的优选实施方式,所述对每个标记的簇进行有监督学习,生成对应具体场景的检测模型包括将每一个标记的簇对应的数据子集作为正数据,将数据集中其它数据作为负数据,应用有监督学习的分类算法进行训练。According to a preferred embodiment of the present invention, performing supervised learning on each labeled cluster and generating a detection model corresponding to a specific scene includes using a subset of data corresponding to each labeled cluster as positive data, and using other data in the data set as For negative data, a classification algorithm with supervised learning is used for training.
根据本发明的优选实施方式,所述基于所述检测模型对异常行为进行检测包括将所述检测模型部署到生产环境,每个检测模型用于对特定安全问题进行检测。According to a preferred embodiment of the present invention, the detecting an abnormal behavior based on the detection model includes deploying the detection model to a production environment, and each detection model is used to detect a specific security problem.
根据本发明的另一方面,提出一种异常行为检测系统,包括以下模块:According to another aspect of the present invention, an abnormal behavior detection system is provided, including the following modules:
特征提取模块,用于从多种不同类型的数据源提取行为特征集;Feature extraction module, used to extract behavior feature sets from multiple different types of data sources;
聚类模块,用于基于无监督学习算法对所述行为特征集进行聚类;A clustering module, configured to cluster the behavior feature set based on an unsupervised learning algorithm;
异常标记模块,用于对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇;Anomaly labeling module, for analyzing the clusters of results obtained by clustering, and marking clusters with abnormalities corresponding to specific scenes;
学习模块,用于对每个标记的簇进行有监督学习,生成对应具体场景的检测模型;A learning module for supervised learning of each labeled cluster to generate a detection model corresponding to a specific scene;
检测模块,基于所述检测模型对异常行为进行检测。A detection module detects abnormal behavior based on the detection model.
根据本发明的优选实施方式,所述特征提取模块包括:According to a preferred embodiment of the present invention, the feature extraction module includes:
数据提取模块,用于从多种不同类型的数据源提取分析主体的行为数据;Data extraction module, used to extract the behavioral data of the analysis subject from a variety of different types of data sources;
数据处理模块,用于对提取的行为数据进行聚合、变换和特征抽取,形成分析主体的行为特征集。A data processing module is used to aggregate, transform and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
根据本发明的优选实施方式,所述多种不同类型的数据源是来自多个系统的日志数据。According to a preferred embodiment of the present invention, the plurality of different types of data sources are log data from a plurality of systems.
根据本发明的优选实施方式,所述分析主体包括用户或者企业服务器。According to a preferred embodiment of the present invention, the analysis subject includes a user or an enterprise server.
根据本发明的优选实施方式,所述数据处理模块对提取的行为数据进行聚 合、变换和特征抽取包括对提取的分析主体的行为数据根据时间窗口进行切分。According to a preferred embodiment of the present invention, the data processing module performs aggregation, transformation, and feature extraction on the extracted behavior data, and includes segmenting the extracted behavior data of the analysis subject according to a time window.
根据本发明的优选实施方式,同一分析主体在同一时间窗口内的日志子集对应生成一条行为特征。According to a preferred embodiment of the present invention, a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.
根据本发明的优选实施方式,所述时间窗口包括对数据流按连续等长进行分割的滚动窗口。According to a preferred embodiment of the present invention, the time window includes a rolling window that divides the data stream by consecutive equal lengths.
根据本发明的优选实施方式,所述时间窗口包括对数据流按时间间隔进行分割的会话窗口。According to a preferred embodiment of the present invention, the time window includes a session window that divides the data stream at time intervals.
根据本发明的优选实施方式,所述聚类模块基于无监督学习算法对所述行为特征集进行聚类包括将行为特征集中的行为数据按照其特征间的距离进行聚类。According to a preferred embodiment of the present invention, the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes clustering behavior data in the behavior feature set according to a distance between the features.
根据本发明的优选实施方式,所述聚类模块基于无监督学习算法对所述行为特征集进行聚类包括对聚类的结果簇进行排序。According to a preferred embodiment of the present invention, the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes ranking a cluster of result clusters.
根据本发明的优选实施方式,所述对聚类的结果簇进行排序依据聚类算法对应的效果评价指数或者外部威胁情报在数据中的命中率。According to a preferred embodiment of the present invention, the clustering of the result clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit rate of external threat intelligence in the data.
根据本发明的优选实施方式,所述异常标记模块对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇包括对异常的簇标记具体的安全问题。According to a preferred embodiment of the present invention, the anomaly labeling module analyzes a cluster of results obtained by clustering, and a cluster marked with an abnormality corresponding to a specific scene includes marking an abnormal cluster with a specific security problem.
根据本发明的优选实施方式,所述学习模块对每个标记的簇进行有监督学习,生成对应具体场景的检测模型包括将每一个标记的簇对应的数据子集作为正数据,将数据集中其它数据作为负数据,应用有监督学习的分类算法进行训练。According to a preferred embodiment of the present invention, the learning module performs supervised learning on each labeled cluster, and generates a detection model corresponding to a specific scene, including using a subset of data corresponding to each labeled cluster as positive data, and collecting other data As negative data, the classification algorithm with supervised learning is used for training.
根据本发明的优选实施方式,所述检测模块基于所述检测模型对异常行为进行检测包括将所述检测模型部署到生产环境,每个检测模型用于对特定安全问题进行检测。According to a preferred embodiment of the present invention, the detection module detecting the abnormal behavior based on the detection model includes deploying the detection model to a production environment, and each detection model is used to detect a specific security problem.
根据本发明的另一方面,提出一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行以实现如前述任一实施例所述的异常行为检测方法的步骤。According to another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of the foregoing embodiments.
根据本发明的另一方面,提出一种计算设备,其包括存储器和处理器,所 述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序以实现如前述任一实施例所述的异常行为检测方法的步骤。According to another aspect of the present invention, a computing device is provided, which includes a memory and a processor. The memory stores a computer program executable on the processor, and the processor executes the computer program to implement Steps of the abnormal behavior detection method according to any embodiment.
与现有技术相比,本发明实施例的异常行为检测方法和系统从不同的数据源提取行为特征集,进行无监督学习聚类,对异常的结果簇进行有监督学习以产生对应具体应用场景的高适配度和准确性的检测模型,通过这些学习获得的检测模型实现对生产环境中数据集的异常行为检测。本发明实施例能够整合不同数据源的行为数据,能够提高对异常行为的自动识别的准确性,显著降低人工成本。Compared with the prior art, the abnormal behavior detection method and system of the embodiments of the present invention extract behavior feature sets from different data sources, perform unsupervised learning clustering, and perform supervised learning on abnormal result clusters to generate corresponding specific application scenarios. Detection model with high adaptability and accuracy. Through these learning detection models, we can detect abnormal behavior of data sets in the production environment. The embodiments of the present invention can integrate behavior data of different data sources, can improve the accuracy of automatic recognition of abnormal behavior, and significantly reduce labor costs.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为根据本发明一实施例的异常行为检测方法的流程示意图;1 is a schematic flowchart of an abnormal behavior detection method according to an embodiment of the present invention;
图2为本发明实施例用于行为数据处理的时间窗口机制示例图;FIG. 2 is an exemplary diagram of a time window mechanism for behavior data processing according to an embodiment of the present invention; FIG.
图3为根据本发明一实施例的异常行为检测系统的结构示意图。FIG. 3 is a schematic structural diagram of an abnormal behavior detection system according to an embodiment of the present invention.
具体实施方式detailed description
为了更清楚地说明本发明实施例的技术方案,下面将对照附图说明本发明的具体实施方式。In order to explain the technical solutions of the embodiments of the present invention more clearly, specific implementations of the present invention will be described below with reference to the accompanying drawings.
图1为根据本发明一实施例的异常行为检测方法的流程示意图。如图1所示,本发明实施例的异常行为检测方法包括以下步骤:FIG. 1 is a schematic flowchart of an abnormal behavior detection method according to an embodiment of the present invention. As shown in FIG. 1, the abnormal behavior detection method according to the embodiment of the present invention includes the following steps:
步骤S11,从多种不同类型的数据源提取行为特征集。Step S11, extracting a behavior feature set from a plurality of different types of data sources.
首先定义数据源。数据源可以是来自多个系统的日志。目标是从每个系统的日志中抽取出分析主体在该系统中的行为。有效的行为模型包括以下主要构件:分析主体ID,时间戳,事件名称,具体的行为,行为操作的对象,事件的结果等。不同数据源的数据通常需要满足完整性、规范性和齐整性,各行为数据能通过关联关系连接至分析主体ID。这里分析主体可以是用户,对应的分析主体ID可以用户ID,也可以是企业内部的重要资产,如对外的业务服务集群,其对应的分析主体ID可以服务器IP地址。First define the data source. The data source can be logs from multiple systems. The goal is to extract the analysis subject's behavior in that system from the logs of each system. An effective behavior model includes the following main components: analysis subject ID, time stamp, event name, specific behavior, object of behavior operation, event result, etc. Data from different data sources usually need to meet completeness, normativity, and uniformity, and each behavioral data can be connected to the analysis subject ID through an association relationship. Here, the analysis subject may be a user, and the corresponding analysis subject ID may be a user ID, or it may be an important asset within the enterprise, such as an external business service cluster, and the corresponding analysis subject ID may be a server IP address.
以表1中的VPN数据源日志为例,抽取行为模型对应的字段。其中,分析主体ID对应‘user_id’或者‘src_ip’或者‘dst_ip’,时间戳对应‘timestamp’, 事件名称对应‘activity’,行为操作的对象对应‘pc_name’,事件的结果对应‘status’。其他字段若不能对应行为模型则可以抛弃,以节省带宽或内存。Taking the VPN data source log in Table 1 as an example, the fields corresponding to the behavior model are extracted. The analysis subject ID corresponds to 'user_id' or 'src_ip' or 'dst_ip', the time stamp corresponds to 'timestamp', the event name corresponds to 'activity', the object of the action operation corresponds to 'pc_name', and the result of the event corresponds to 'status'. If other fields cannot correspond to the behavior model, they can be discarded to save bandwidth or memory.
表1 VPN数据源日志样例Table 1 Sample VPN data source logs
timestamptimestamp src_ipsrc_ip dst_ipdst_ip user_iduser_id activityactivity pc_namepc_name statusstatus
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z 111.163.192.68111.163.192.68 23.123.22.2223.123.22.22 Abcd12Abcd12 connectconnect PC_1002PC_1002 successsuccess
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z 117.14.161.205117.14.161.205 23.123.22.2223.123.22.22 Abcd12Abcd12 connectconnect PC_1021PC_1021 successsuccess
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z 117.14.161.207117.14.161.207 23.123.22.2223.123.22.22 Efgk21Efgk21 connectconnect PC_2192PC_2192 failfail
2018-05-21T04:00:00.000Z2018-05-21T04: 00: 00.000Z 117.14.161.229117.14.161.229 23.123.22.2223.123.22.22 Hijk90Hijk90 disconnectdisconnect PC_1202PC_1202 successsuccess
分析主体是行为数据的重要键值,所有数据结构都依附分析主体构建。定义分析主体可以从日志中定义,用日志中对应分析主体ID的字段作键值。同时可以通过键值做丰富化和分组。如上述表1的VPN登录日志中,可以使用账号ID(user_id)字段来定义分析主体,也可以使用源IP(src_ip)或目的IP(dst_ip)来定义分析主体。不同定义的分析主体会导致行为特征的明显不同,即分析的问题也不同。若选择账号ID为主体,即做用户行为分析,以账号为单位做聚合,对应‘Abcd12’登录成功两次,来自不同源IP和PC。若选择源IP为主体,则为4个外部IP分别登录了一次。若以目的IP为主体,则是对内部的VPN服务器进行分析,聚合为从不同IP分别登录了四次。The analysis subject is an important key value of behavioral data, and all data structures are built on the analysis subject. The definition analysis subject can be defined from the log, and the field corresponding to the analysis subject ID in the log is used as the key value. At the same time, you can enrich and group by key. As shown in the VPN login log in Table 1 above, the account ID (user_id) field can be used to define the analysis subject, and the source IP (src_ip) or destination IP (dst_ip) can be used to define the analysis subject. Different definitions of the analysis subject will result in significantly different behavior characteristics, that is, the analysis problems are also different. If the account ID is selected as the subject, user behavior analysis is performed and aggregation is performed on the account unit basis. Corresponding ‘Abcd12’ login succeeds twice, from different source IPs and PCs. If the source IP is selected as the subject, four external IPs are logged in once. If the destination IP is used as the main body, the internal VPN server is analyzed and aggregated into four logins from different IPs.
多种不同的数据源通过可配置的数据采集器进行数据抽取,在不同的数据源中抽取行为模型对应的字段,形成简化的日志。而后,可根据不同的分析主体进行分组。假设E表示定义的分析主体,先对分析主体E进行分组,每个分析主体E对应的日志再根据时间窗口W进行切分。切分后每个日志子集S一定是同一个分析主体,同一个时间窗口内的行为数据。每个日志子集S再根据算法需要进行字段的聚合、变换、特征抽取,最终形成分析主体E的行为特征F。最后,在一个长周期的分析过程中,假设分析主体E的数量为M,时间窗口W的数量为N,那么该步骤会输出约M*N条的行为特征集。行为特征F对应生成它的日志子集S中的日志。行为特征F的示例如下表2所示,其中必须字段包括窗口的开始时间(start_time),结束时间(end_time)以及各个特征对应的值。A variety of different data sources extract data through a configurable data collector. Fields corresponding to the behavior model are extracted from different data sources to form a simplified log. Then, you can group according to different analysis subjects. Assume that E represents a defined analysis subject. The analysis subject E is first grouped, and the log corresponding to each analysis subject E is then segmented according to the time window W. After segmentation, each log subset S must be the same analysis subject and behavior data within the same time window. Each log subset S then performs field aggregation, transformation, and feature extraction according to the needs of the algorithm, and finally forms the behavior feature F of the analysis subject E. Finally, during a long period of analysis, assuming that the number of analysis subjects E is M and the number of time windows W is N, then this step will output about M * N behavior feature sets. The behavior feature F corresponds to generating logs in its log subset S. An example of the behavior feature F is shown in Table 2 below, where the required fields include the start time (start_time), end time (end_time) of the window, and the value corresponding to each feature.
表2行为特征示例Table 2 Examples of behavior characteristics
Figure PCTCN2019101543-appb-000001
Figure PCTCN2019101543-appb-000001
时间窗口的定义如图2所示。两种不同的窗口机制可以应用于本发明实施例。第一种是滚动窗口,即数据流按连续等长的时间窗口分割,参数为窗口大小window_size。第二种是会话窗口,即数据流按时间间隔来划分,参数为time_interval。连续两个事件的发生时间间隔小于time_interval的,则归为一个窗口,发生时间间隔大于time_interval的,则结束上一个会话窗口,新事件划分入一个新的会话窗口。这两种机制的时间窗口的划分都和单个窗口内日志的数量无关。The definition of the time window is shown in Figure 2. Two different window mechanisms can be applied to the embodiments of the present invention. The first is a rolling window, that is, the data stream is divided into consecutive equal-length time windows, and the parameter is the window size window_size. The second is the session window, that is, the data flow is divided by time interval, and the parameter is time_interval. If the time interval between two consecutive events is less than time_interval, it is classified as a window. If the time interval between two consecutive events is greater than time_interval, the previous session window is ended, and the new event is divided into a new session window. The division of the time window of these two mechanisms is independent of the number of logs in a single window.
步骤S12,基于无监督学习算法对所述行为特征集进行聚类。Step S12: Cluster the behavior feature set based on an unsupervised learning algorithm.
该步骤对步骤S11获得的所有行为特征F组成的行为特征集做聚类。聚类为无监督学习算法中的一种,无监督学习算法不需要对数据进行预先标记。聚类算法将行为特征集中的行为数据按照其特征间的距离进行聚类。相似行为对应相似特征,会被聚集在一个类中。不同行为对应的特征聚类较远,会被分到不用的类中。聚类算法有多种不同的实现,如KMeans(K均值聚类),DBSCAN(具有噪声的基于密度的聚类),Hierarchical Clustering(层次聚类)等。以KMeans(K均值聚类)为例,其标准伪代码如下:This step clusters the behavior feature set composed of all the behavior features F obtained in step S11. Clustering is one of the unsupervised learning algorithms. Unsupervised learning algorithms do not need to pre-label the data. The clustering algorithm clusters the behavior data in the behavior feature set according to the distance between the features. Similar behaviors correspond to similar features and are grouped together in a class. The feature clusters corresponding to different behaviors are far away and will be classified into unused classes. There are many different implementations of clustering algorithms, such as KMeans (K-means clustering), DBSCAN (density-based clustering with noise), Hierarchical Clustering (hierarchical clustering), and so on. Taking Kmeans (K-means clustering) as an example, the standard pseudo-code is as follows:
Figure PCTCN2019101543-appb-000002
Figure PCTCN2019101543-appb-000002
在优选的实施方式中,聚类的结果簇将被打分,并根据打分结果进行排序。排序的依据可以是聚类算法对应的效果评价指数,如对应类的内聚度,类的内聚度定义为(1–类内距离/类间距离),其中类内距离为该类内部特征间的平均距离,类间距离为该类内部特征与类外部其他特征的平均距离;排序也可以引入外部的威胁情报,将威胁情报在数据中的命中率作为威胁指数。评价指数伪代码如下:In a preferred embodiment, the clusters of clustering results will be scored and sorted according to the scored results. The ranking can be based on the performance evaluation index corresponding to the clustering algorithm, such as the cohesion of the corresponding class. The cohesion of the class is defined as (1-intra-class distance / inter-class distance), where the intra-class distance is the internal characteristics of the class. The average distance between two classes is the average distance between the internal features of the class and other features outside the class. Ranking can also introduce external threat intelligence, and use the hit rate of threat intelligence in the data as the threat index. The evaluation index pseudo code is as follows:
Figure PCTCN2019101543-appb-000003
Figure PCTCN2019101543-appb-000003
步骤S13,对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇。Step S13: Analyze the result clusters obtained by the clustering, and mark abnormal clusters corresponding to specific scenes.
该步骤中,可以由安全专家或者业务专员对聚类获得的结果簇进行安全分析或业务异常的分析。在优选的实施方式中,可以从威胁指数较高的簇到威胁指数较低的簇进行安全分析或业务异常的分析。In this step, a security expert or a business specialist may perform a security analysis or a business anomaly analysis on the result cluster obtained by the clustering. In a preferred embodiment, security analysis or business abnormality analysis may be performed from a cluster with a higher threat index to a cluster with a lower threat index.
每个簇都有系统生成的分析依据,包括行为特征和原始数据,用于直接分析。辅助依据包括威胁情报命中率、簇的大小、簇内行为的紧密程度、簇中距离中心最近的主体和最远的主体等技术指标。安全专家主要从簇内行为对应的业务是否合规,是否符合已知的安全模型特征等角度进行评判。一个簇的分析依据示例如下表3所示:Each cluster has a systematically generated analysis basis, including behavioral characteristics and raw data, for direct analysis. The auxiliary basis includes technical indicators such as the threat intelligence hit rate, the size of the cluster, the closeness of the behavior within the cluster, the closest subject in the cluster and the farthest subject. Security experts mainly judge from the perspective of whether the business corresponding to the behavior in the cluster is compliant and whether it meets the characteristics of known security models. An example of the analysis basis of a cluster is shown in Table 3 below:
表3类分析依据示例Table 3 Examples of Analysis Basis
Figure PCTCN2019101543-appb-000004
Figure PCTCN2019101543-appb-000004
根据对簇分析的结果,由安全专家或者业务专员对异常的簇进行标记。标记为一个具体的安全威胁或者业务安全问题。标记一个簇的同时,簇内行为对应的数据子集也同时被赋予相同的标记。According to the results of cluster analysis, abnormal clusters are marked by security experts or business professionals. Mark as a specific security threat or business security issue. When a cluster is labeled, the subset of data corresponding to the behavior in the cluster is also given the same label.
步骤S14,对每个标记的簇进行有监督学习,生成对应具体场景的检测模型。In step S14, supervised learning is performed on each labeled cluster to generate a detection model corresponding to a specific scene.
该步骤中,对每一个标记好的簇,将其对应的数据子集作为标记的正数据。将数据集中其他的数据作为负数据,应用有监督学习的分类算法进行训练。训练好的模型对应这套数据集下特定安全问题的检测模型。由于是从实际数据中训练和导出,该检测模型完全适配当前数据集下的行为特征,对其检测的安全问题有很高的检测准确度。有监督学习算法可以选用业界公认效果较好的XGBoost、GBDT、LightGBM等,标准伪代码如下:In this step, for each labeled cluster, the corresponding data subset is used as the labeled positive data. The other data in the data set is regarded as negative data, and the classification algorithm with supervised learning is used for training. The trained model corresponds to the detection model of specific security problems in this data set. Because it is trained and derived from actual data, the detection model is fully adapted to the behavioral characteristics of the current data set, and has high detection accuracy for its detection of security issues. The supervised learning algorithm can choose XGBoost, GBDT, LightGBM, etc., which are widely recognized in the industry. The standard pseudo code is as follows:
Figure PCTCN2019101543-appb-000005
Figure PCTCN2019101543-appb-000005
步骤S15,基于所述检测模型对异常行为进行检测。Step S15: Detect abnormal behavior based on the detection model.
该步骤中,首先对获得的检测模型进行部署,将若干训练好的检测模型部署在生产系统中。数据采集和处理的流程需要保证和训练数据采集和处理的流程一致,保证输出的行为特征具有一致性。行为特征集分别进入不同的检测模型进行检测,每个检测模型用于对特定安全问题进行检测,经过检测模型识别分类为异常的数据将被封装为告警。In this step, the obtained detection models are deployed first, and several trained detection models are deployed in the production system. The process of data collection and processing needs to be consistent with the process of training data collection and processing, and to ensure that the behavioral characteristics of the output are consistent. The behavior feature set is separately entered into different detection models for detection. Each detection model is used to detect specific security problems. Data identified as abnormal by the detection model will be encapsulated as alarms.
本发明实施例的异常行为检测方法将来自不同的数据源抽取的行为特征集进行无监督学习聚类,对异常结果簇进行有监督学习以产生对应具体应用场景的高适配度和准确性的检测模型,通过这些学习获得的检测模型实现对生产环境中数据集的异常行为检测。该方法能够整合不同数据源的行为数据,能够提高对异常行为的自动识别的准确性,显著降低人工成本。The abnormal behavior detection method according to the embodiment of the present invention performs unsupervised learning clustering on behavior feature sets extracted from different data sources, and performs supervised learning on abnormal result clusters to generate high adaptability and accuracy corresponding to specific application scenarios Detection models. These learning detection models are used to detect abnormal behavior of data sets in a production environment. This method can integrate behavior data from different data sources, improve the accuracy of automatic recognition of abnormal behavior, and significantly reduce labor costs.
图3为根据本发明一实施例的异常行为检测系统的结构示意图。如图3所示,本发明实施例的异常行为检测系统包括以下功能模块:FIG. 3 is a schematic structural diagram of an abnormal behavior detection system according to an embodiment of the present invention. As shown in FIG. 3, the abnormal behavior detection system according to the embodiment of the present invention includes the following functional modules:
特征提取模块21,用于从多种不同类型的数据源提取行为特征集;A feature extraction module 21, configured to extract behavior feature sets from a plurality of different types of data sources;
聚类模块22,用于基于无监督学习算法对所述行为特征集进行聚类;A clustering module 22, configured to cluster the behavior feature set based on an unsupervised learning algorithm;
异常标记模块23,用于对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇;An abnormality labeling module 23, configured to analyze the clusters of results obtained by clustering, and mark abnormality clusters corresponding to specific scenes;
学习模块24,用于对每个标记的簇进行有监督学习,生成对应具体场景的检测模型;A learning module 24 for performing supervised learning on each labeled cluster to generate a detection model corresponding to a specific scene;
检测模块25,用于基于所述检测模型对异常行为进行检测。A detection module 25 is configured to detect an abnormal behavior based on the detection model.
进一步地,所述特征提取模块还包括:Further, the feature extraction module further includes:
数据提取模块211,用于从多种不同类型的数据源提取分析主体的行为数据;A data extraction module 211, configured to extract behavior data of an analysis subject from multiple different types of data sources;
数据处理模块212,用于对提取的行为数据进行聚合、变换和特征抽取,形成分析主体的行为特征集。A data processing module 212 is configured to aggregate, transform and feature extract the extracted behavior data to form a behavior feature set of the analysis subject.
本发明实施例中,不同类型的数据源可以是来自多个系统的日志。目标是从每个系统的日志中抽取出分析主体在该系统中的行为。有效的行为模型包括以下主要构件:分析主体ID,时间戳,事件名称,具体的行为,行为操作的对象,事件的结果等。不同数据源的数据通常需要满足完整性、规范性和齐整性,各行为数据能通过关联关系连接至分析主体ID。这里分析主体可以是用户,对应的分析主体ID可以用户ID,也可以是企业内部的重要资产,如对外的业务服务集群,其对应的分析主体ID可以服务器IP地址。In the embodiment of the present invention, different types of data sources may be logs from multiple systems. The goal is to extract the analysis subject's behavior in that system from the logs of each system. An effective behavior model includes the following main components: analysis subject ID, time stamp, event name, specific behavior, object of behavior operation, event result, etc. Data from different data sources usually need to meet completeness, normativity, and uniformity, and each behavioral data can be connected to the analysis subject ID through an association relationship. Here, the analysis subject may be a user, and the corresponding analysis subject ID may be a user ID, or it may be an important asset within the enterprise, such as an external business service cluster, and the corresponding analysis subject ID may be a server IP address.
特征提取模块21对多种不同的数据源通过可配置的数据采集器进行数据抽取,在不同的数据源中抽取行为模型对应的字段,形成简化的日志。而后,可根据不同的分析主体进行分组。假设E表示定义的分析主体,先对分析主体E进行分组,每个分析主体E对应的日志再根据时间窗口W进行切分。切分后每个日志子集S一定是同一个分析主体,同一个时间窗口内的行为数据。每个日志子集S再根据算法需要进行字段的聚合、变换、特征抽取,最终形成分析主体E的行为特征F。最后,在一个长周期的分析过程中,假设分析主体E的数量为M,时间窗口W的数量为N,那么最终会输出约M*N条的行为特征集。The feature extraction module 21 extracts data from a variety of different data sources through a configurable data collector, and extracts fields corresponding to the behavior model from different data sources to form a simplified log. Then, you can group according to different analysis subjects. Assume that E represents a defined analysis subject. The analysis subject E is first grouped, and the log corresponding to each analysis subject E is then segmented according to the time window W. After segmentation, each log subset S must be the same analysis subject and behavior data within the same time window. Each log subset S then performs field aggregation, transformation, and feature extraction according to the needs of the algorithm, and finally forms the behavior feature F of the analysis subject E. Finally, in a long-period analysis process, assuming that the number of analysis subjects E is M and the number of time windows W is N, then eventually M * N behavior feature sets will be output.
聚类模块22对特征提取模块21获得的行为特征集做聚类。聚类为无监督学 习算法中的一种,无监督学习算法不需要对数据进行预先标记。聚类算法将行为特征集中的行为数据按照其特征间的距离进行聚类。相似行为对应相似特征,会被聚集在一个类中。不同行为对应的特征聚类较远,会被分到不用的类中。聚类算法有多种不同的实现,如KMeans(K均值聚类),DBSCAN(具有噪声的基于密度的聚类),Hierarchical Clustering(层次聚类)等。The clustering module 22 clusters the behavior feature set obtained by the feature extraction module 21. Clustering is one of the unsupervised learning algorithms. Unsupervised learning algorithms do not need to pre-label the data. The clustering algorithm clusters the behavior data in the behavior feature set according to the distance between the features. Similar behaviors correspond to similar features and are grouped together in a class. The feature clusters corresponding to different behaviors are far away and will be classified into unused classes. There are many different implementations of clustering algorithms, such as KMeans (K-means clustering), DBSCAN (density-based clustering with noise), Hierarchical Clustering (hierarchical clustering), and so on.
在优选的实施方式中,聚类的结果簇还可以被打分,并根据打分结果进行排序。排序的依据可以是聚类算法对应的效果评价指数,如对应类的内聚度,类的内聚度定义为(1–类内距离/类间距离),其中类内距离为该类内部特征间的平均距离,类间距离为该类内部特征与类外部其他特征的平均距离;排序也可以引入外部的威胁情报,将威胁情报在数据中的命中率作为威胁指数。In a preferred embodiment, the result clusters of the clustering can also be scored and sorted according to the scored results. The ranking can be based on the performance evaluation index corresponding to the clustering algorithm, such as the cohesion of the corresponding class. The cohesion of the class is defined as (1-intra-class distance / inter-class distance), where the intra-class distance is the internal characteristics of the class. The average distance between two classes is the average distance between the internal features of the class and other features outside the class. Ranking can also introduce external threat intelligence, and use the hit rate of threat intelligence in the data as the threat index.
异常标记模块23用于由安全专家或者业务专员对聚类获得的结果簇进行安全分析或业务异常的分析。在优选的实施方式中,可以从威胁指数较高的簇到威胁指数较低的簇进行安全分析或业务异常的分析。The abnormality labeling module 23 is configured to perform security analysis or business abnormality analysis on the result cluster obtained by the clustering by a security expert or a business specialist. In a preferred embodiment, security analysis or business abnormality analysis may be performed from a cluster with a higher threat index to a cluster with a lower threat index.
每个簇都有系统生成的分析依据,包括行为特征和原始数据,用于直接分析。辅助依据包括威胁情报命中率、簇的大小、簇内行为的紧密程度、簇中距离中心最近的主体和最远的主体等技术指标。安全专家主要从簇内行为对应的业务是否合规,是否符合已知的安全模型特征等角度进行评判。Each cluster has a systematically generated analysis basis, including behavioral characteristics and raw data, for direct analysis. The auxiliary basis includes technical indicators such as the threat intelligence hit rate, the size of the cluster, the closeness of the behavior within the cluster, the closest subject in the cluster and the farthest subject. Security experts mainly judge from the perspective of whether the business corresponding to the behavior in the cluster is compliant and whether it meets the characteristics of known security models.
根据对簇分析的结果,由安全专家或者业务专员对异常的簇进行标记。标记为一个具体的安全威胁或者业务安全问题。标记一个簇的同时,簇内行为对应的数据子集也同时被赋予相同的标记。According to the results of cluster analysis, abnormal clusters are marked by security experts or business professionals. Mark as a specific security threat or business security issue. When a cluster is labeled, the subset of data corresponding to the behavior in the cluster is also given the same label.
学习模块24应用有监督学习的分类算法对每一个标记好的簇进行训练。训练好的模型对应这套数据集下特定安全问题的检测模型。由于是从实际数据中训练和导出,该检测模型完全适配当前数据集下的行为特征,对其检测的安全问题有很高的检测准确度。有监督学习算法可以选用业界公认效果较好的XGBoost、GBDT、LightGBM等。The learning module 24 applies a supervised learning classification algorithm to train each labeled cluster. The trained model corresponds to the detection model of specific security problems in this data set. Because it is trained and derived from actual data, the detection model is fully adapted to the behavioral characteristics of the current data set, and has high detection accuracy for its detection of security issues. For supervised learning algorithms, XGBoost, GBDT, LightGBM, etc., which are recognized by the industry as effective, can be used.
检测模块25基于所述检测模型对异常行为进行检测。首先对获得的检测模型进行部署,将若干训练好的检测模型部署在生产系统中。数据采集和处理的流程需要保证和训练数据采集和处理的流程一致,保证输出的行为特征具有一 致性。行为特征集分别进入不同的检测模型进行检测,每个检测模型用于对特定安全问题进行检测,经过检测模型识别分类为异常的数据将被封装为告警。The detection module 25 detects abnormal behavior based on the detection model. First, the obtained detection model is deployed, and several trained detection models are deployed in the production system. The process of data collection and processing needs to be consistent with the process of training data collection and processing, to ensure that the behavioral characteristics of the output are consistent. The behavior feature set is separately entered into different detection models for detection. Each detection model is used to detect specific security problems. Data identified as abnormal by the detection model will be encapsulated as alarms.
本发明实施例的异常行为检测系统将来自不同的数据源抽取的行为特征集进行无监督学习聚类,对异常结果簇进行有监督学习以产生对应具体应用场景的高适配度和准确性的检测模型,通过这些学习获得的检测模型实现对生产环境中数据集的异常行为检测。该系统能够整合不同数据源的行为数据,能够提高对异常行为的自动识别的准确性,显著降低人工成本。The abnormal behavior detection system according to the embodiment of the present invention performs unsupervised learning clustering on behavior feature sets extracted from different data sources, and performs supervised learning on abnormal result clusters to generate high adaptability and accuracy corresponding to specific application scenarios. Detection models. These learning detection models are used to detect abnormal behavior of data sets in a production environment. The system can integrate behavioral data from different data sources, improve the accuracy of automatic identification of abnormal behaviors, and significantly reduce labor costs.
根据本发明的另一实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行以实现前述任一实施例所述的异常行为检测方法的步骤。According to another embodiment of the present invention, there is also provided a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of the foregoing embodiments.
根据本发明的另一实施例,还提供一种计算设备,其包括存储器和处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序以实现前述任一实施例所述的异常行为检测方法的步骤。According to another embodiment of the present invention, a computing device is further provided, which includes a memory and a processor. The memory stores a computer program executable on the processor, and the processor executes the computer program to implement Steps of the abnormal behavior detection method according to any one of the foregoing embodiments.
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明。应当说明的是,以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干变化和改进,这些变化和改进也应视为落入本发明的保护范围。The specific embodiments described above further describe the objectives, technical solutions, and beneficial effects of the present invention in further detail. It should be noted that the above is only a preferred embodiment of the present invention, and it should be noted that for those of ordinary skill in the art, without departing from the principles of the present invention, several changes and improvements can be made. These changes and improvements should also be regarded as falling within the protection scope of the present invention.

Claims (30)

  1. 一种异常行为检测方法,其特征在于,包括以下步骤:A method for detecting abnormal behavior, comprising the following steps:
    从多种不同类型的数据源提取行为特征集;Extract behavioral feature sets from many different types of data sources;
    基于无监督学习算法对所述行为特征集进行聚类;Clustering the behavior feature set based on an unsupervised learning algorithm;
    对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇;Analyze the result clusters obtained by clustering, and mark abnormal clusters corresponding to specific scenes;
    对每个标记的簇进行有监督学习,生成对应具体场景的检测模型;Supervised learning for each labeled cluster to generate a detection model corresponding to a specific scene;
    基于所述检测模型对异常行为进行检测。Detecting abnormal behavior based on the detection model.
  2. 如权利要求1所述的异常行为检测方法,其特征在于,所述从多种不同类型的数据源提取行为特征集包括:The abnormal behavior detection method according to claim 1, wherein the extracting behavior feature sets from a plurality of different types of data sources comprises:
    从多种不同类型的数据源提取分析主体的行为数据;Extracting and analyzing subject behavior data from a variety of different types of data sources;
    对提取的行为数据进行聚合、变换和特征抽取,形成分析主体的行为特征集。Aggregate, transform, and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
  3. 如权利要求2所述的异常行为检测方法,其特征在于,所述多种不同类型的数据源是来自多个系统的日志数据。The abnormal behavior detection method according to claim 2, wherein the plurality of different types of data sources are log data from a plurality of systems.
  4. 如权利要求3所述的异常行为检测方法,其特征在于,所述分析主体包括用户或者企业服务器。The abnormal behavior detection method according to claim 3, wherein the analysis subject comprises a user or an enterprise server.
  5. 如权利要求4所述的异常行为检测方法,其特征在于,所述对提取的行为数据进行聚合、变换和特征抽取包括对提取的分析主体的行为数据根据时间窗口进行切分。The abnormal behavior detection method according to claim 4, wherein the aggregating, transforming, and extracting the extracted behavior data comprises segmenting the extracted behavior data of the analysis subject according to a time window.
  6. 如权利要求5所述的异常行为检测方法,其特征在于,同一分析主体在同一时间窗口内的日志子集对应生成一条行为特征。The abnormal behavior detection method according to claim 5, wherein a subset of the logs of the same analysis subject within the same time window correspondingly generates a behavior feature.
  7. 如权利要求5所述的异常行为检测方法,其特征在于,所述时间窗口包括对数据流按连续等长进行分割的滚动窗口。The abnormal behavior detection method according to claim 5, wherein the time window comprises a rolling window that divides the data stream by continuous equal length.
  8. 如权利要求5所述的异常行为检测方法,其特征在于,所述时间窗口包括对数据流按时间间隔进行分割的会话窗口。The abnormal behavior detection method according to claim 5, wherein the time window comprises a session window for segmenting the data stream at time intervals.
  9. 如权利要求1所述的异常行为检测方法,其特征在于,所述基于无监督学习算法对所述行为特征集进行聚类包括将行为特征集中的行为数据按照其特征间的距离进行聚类。The abnormal behavior detection method according to claim 1, wherein the clustering the behavior feature set based on an unsupervised learning algorithm comprises clustering behavior data in the behavior feature set according to a distance between the features.
  10. 如权利要求9所述的异常行为检测方法,其特征在于,所述基于无 监督学习算法对所述行为特征集进行聚类包括对聚类的结果簇进行排序。The abnormal behavior detection method according to claim 9, wherein the clustering the behavior feature set based on an unsupervised learning algorithm comprises sorting a cluster of result clusters.
  11. 如权利要求10所述的异常行为检测方法,其特征在于,所述对聚类的结果簇进行排序依据聚类算法对应的效果评价指数或者外部威胁情报在数据中的命中率。The abnormal behavior detection method according to claim 10, wherein the sorting of the result clusters of the clusters is based on an effect evaluation index corresponding to the clustering algorithm or a hit ratio of external threat intelligence in the data.
  12. 如权利要求1所述的异常行为检测方法,其特征在于,所述对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇包括对异常的簇标记具体的安全问题。The abnormal behavior detection method according to claim 1, wherein the analysis of the result clusters obtained by the clustering, and the clusters marked with abnormalities corresponding to the specific scenes include the specific security problems marked with the abnormal clusters.
  13. 如权利要求12所述的异常行为检测方法,其特征在于,所述对每个标记的簇进行有监督学习,生成对应具体场景的检测模型包括将每一个标记的簇对应的数据子集作为正数据,将数据集中其它数据作为负数据,应用有监督学习的分类算法进行训练。The abnormal behavior detection method according to claim 12, wherein performing supervised learning on each labeled cluster, and generating a detection model corresponding to a specific scene includes using a subset of data corresponding to each labeled cluster as a positive Data, the other data in the data set is regarded as negative data, and the classification algorithm with supervised learning is used for training.
  14. 如权利要求1所述的异常行为检测方法,其特征在于,所述基于所述检测模型对异常行为进行检测包括将所述检测模型部署到生产环境,每个检测模型用于对特定安全问题进行检测。The abnormal behavior detection method according to claim 1, wherein detecting the abnormal behavior based on the detection model comprises deploying the detection model to a production environment, and each detection model is used to perform a specific security problem. Detection.
  15. 一种异常行为检测系统,其特征在于,包括以下模块:An abnormal behavior detection system includes the following modules:
    特征提取模块,用于从多种不同类型的数据源提取行为特征集;Feature extraction module, used to extract behavior feature sets from multiple different types of data sources;
    聚类模块,用于基于无监督学习算法对所述行为特征集进行聚类;A clustering module, configured to cluster the behavior feature set based on an unsupervised learning algorithm;
    异常标记模块,用于对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇;Anomaly labeling module, for analyzing the clusters of results obtained by clustering, and marking clusters with abnormalities corresponding to specific scenes;
    学习模块,用于对每个标记的簇进行有监督学习,生成对应具体场景的检测模型;A learning module for supervised learning of each labeled cluster to generate a detection model corresponding to a specific scene;
    检测模块,用于基于所述检测模型对异常行为进行检测。A detection module is configured to detect abnormal behavior based on the detection model.
  16. 如权利要求15所述的异常行为检测系统,其特征在于,所述特征提取模块包括:The abnormal behavior detection system according to claim 15, wherein the feature extraction module comprises:
    数据提取模块,用于从多种不同类型的数据源提取分析主体的行为数据;Data extraction module, used to extract the behavioral data of the analysis subject from a variety of different types of data sources;
    数据处理模块,用于对提取的行为数据进行聚合、变换和特征抽取,形成分析主体的行为特征集。A data processing module is used to aggregate, transform and feature extract the extracted behavior data to form the behavior feature set of the analysis subject.
  17. 如权利要求16所述的异常行为检测系统,其特征在于,所述多种不 同类型的数据源是来自多个系统的日志数据。The abnormal behavior detection system according to claim 16, wherein the plurality of different types of data sources are log data from a plurality of systems.
  18. 如权利要求17所述的异常行为检测系统,其特征在于,所述分析主体包括用户或者企业服务器。The abnormal behavior detection system according to claim 17, wherein the analysis subject comprises a user or an enterprise server.
  19. 如权利要求18所述的异常行为检测系统,其特征在于,所述数据处理模块对提取的行为数据进行聚合、变换和特征抽取包括对提取的分析主体的行为数据根据时间窗口进行切分。The abnormal behavior detection system according to claim 18, wherein the data processing module performs aggregation, transformation, and feature extraction on the extracted behavior data, and comprises segmenting the behavior data of the extracted analysis subject according to a time window.
  20. 如权利要求19所述的异常行为检测系统,其特征在于,同一分析主体在同一时间窗口内的日志子集对应生成一条行为特征。The abnormal behavior detection system according to claim 19, wherein a subset of the logs of the same analysis subject in the same time window correspondingly generates a behavior feature.
  21. 如权利要求19所述的异常行为检测系统,其特征在于,所述时间窗口包括对数据流按连续等长进行分割的滚动窗口。The abnormal behavior detection system according to claim 19, wherein the time window comprises a rolling window that divides the data stream by continuous equal length.
  22. 如权利要求19所述的异常行为检测系统,其特征在于,所述时间窗口包括对数据流按时间间隔进行分割的会话窗口。The abnormal behavior detection system according to claim 19, wherein the time window comprises a session window for dividing a data stream at time intervals.
  23. 如权利要求15所述的异常行为检测系统,其特征在于,所述聚类模块基于无监督学习算法对所述行为特征集进行聚类包括将行为特征集中的行为数据按照其特征间的距离进行聚类。The abnormal behavior detection system according to claim 15, wherein the clustering module clusters the behavior feature set based on an unsupervised learning algorithm, and includes performing behavior data in the behavior feature set according to a distance between the features. Clustering.
  24. 如权利要求23所述的异常行为检测系统,其特征在于,所述聚类模块基于无监督学习算法对所述行为特征集进行聚类包括对聚类的结果簇进行排序。The abnormal behavior detection system according to claim 23, wherein the clustering module clustering the behavior feature set based on an unsupervised learning algorithm includes sorting a cluster of result clusters.
  25. 如权利要求24所述的异常行为检测系统,其特征在于,所述对聚类的结果簇进行排序依据聚类算法对应的效果评价指数或者外部威胁情报在数据中的命中率。The abnormal behavior detection system according to claim 24, wherein the sorting of the result clusters of the clusters is based on an effect evaluation index corresponding to a clustering algorithm or a hit ratio of external threat intelligence in the data.
  26. 如权利要求15所述的异常行为检测系统,其特征在于,所述异常标记模块对聚类获得的结果簇进行分析,对应具体场景标记有异常的簇包括对异常的簇标记具体的安全问题。The abnormal behavior detection system according to claim 15, wherein the abnormal labeling module analyzes the clusters of results obtained by clustering, and clusters marked with abnormalities corresponding to specific scenes include specific security problems marked with abnormal clusters.
  27. 如权利要求26所述的异常行为检测系统,其特征在于,所述学习模块对每个标记的簇进行有监督学习,生成对应具体场景的检测模型包括将每一个标记的簇对应的数据子集作为正数据,将数据集中其它数据作为负数据,应用有监督学习的分类算法进行训练。The abnormal behavior detection system according to claim 26, wherein the learning module performs supervised learning on each labeled cluster, and generating a detection model corresponding to a specific scene includes a subset of data corresponding to each labeled cluster As positive data, the other data in the data set is used as negative data, and a classification algorithm with supervised learning is used for training.
  28. 如权利要求15所述的异常行为检测系统,其特征在于,所述检测模块基于所述检测模型对异常行为进行检测包括将所述检测模型部署到生产环境,每个检测模型用于对特定安全问题进行检测。The abnormal behavior detection system according to claim 15, wherein the detecting module detects the abnormal behavior based on the detection model, and comprises deploying the detection model to a production environment, and each detection model is used for specific security Detect problems.
  29. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,该计算机程序被处理器执行以实现如权利要求1-14任一项所述的异常行为检测方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored thereon, and the computer program is executed by a processor to implement the steps of the abnormal behavior detection method according to any one of claims 1-14.
  30. 一种计算设备,其特征在于,其包括存储器和处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序以实现如权利要求1-14任一项所述的异常行为检测方法的步骤。A computing device, characterized in that it includes a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and the processor executes the computer program to implement any of claims 1-14 Steps of an abnormal behavior detection method according to one item.
PCT/CN2019/101543 2018-08-21 2019-08-20 Abnormal behavior detection method and system WO2020038353A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810956807 2018-08-21
CN201810956807.0 2018-08-21

Publications (1)

Publication Number Publication Date
WO2020038353A1 true WO2020038353A1 (en) 2020-02-27

Family

ID=69592287

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/101543 WO2020038353A1 (en) 2018-08-21 2019-08-20 Abnormal behavior detection method and system

Country Status (1)

Country Link
WO (1) WO2020038353A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111352971A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Bank system monitoring data anomaly detection method and system
CN111400157A (en) * 2020-03-23 2020-07-10 北京亿赛通科技发展有限责任公司 System for automatically detecting computer user risk behaviors
CN111814908A (en) * 2020-07-30 2020-10-23 浪潮通用软件有限公司 Abnormal data detection model updating method and device based on data flow
CN111966515A (en) * 2020-07-16 2020-11-20 招联消费金融有限公司 Business abnormal data processing method and device, computer equipment and storage medium
CN112070225A (en) * 2020-09-01 2020-12-11 多点(深圳)数字科技有限公司 Entity card abnormal binding alarm method based on unsupervised learning
CN112488507A (en) * 2020-11-30 2021-03-12 广东电网有限责任公司 Expert classification portrait method and device based on clustering and storage medium
CN112966259A (en) * 2021-03-03 2021-06-15 北京科东电力控制系统有限责任公司 Power monitoring system operation and maintenance behavior security threat assessment method and equipment
CN113409025A (en) * 2021-07-06 2021-09-17 中国工商银行股份有限公司 Service data extraction method, device and storage medium
CN113521750A (en) * 2021-07-15 2021-10-22 珠海金山网络游戏科技有限公司 Abnormal account detection model training method and abnormal account detection method
CN113591909A (en) * 2021-06-23 2021-11-02 北京智芯微电子科技有限公司 Abnormality detection method, abnormality detection device, and storage medium for power system
CN113630419A (en) * 2021-08-16 2021-11-09 中移互联网有限公司 Data classification and data safety monitoring method and system based on API flow
CN113656254A (en) * 2021-08-25 2021-11-16 上海明略人工智能(集团)有限公司 Abnormity detection method and system based on log information and computer equipment
CN113779568A (en) * 2021-09-18 2021-12-10 中国平安人寿保险股份有限公司 Abnormal behavior user identification method, device, equipment and storage medium
CN113923037A (en) * 2021-10-18 2022-01-11 北京八分量信息科技有限公司 Credible computing-based anomaly detection optimization device, method and system
CN114050937A (en) * 2021-11-18 2022-02-15 北京天融信网络安全技术有限公司 Processing method and device for mailbox service unavailability, electronic equipment and storage medium
CN114239855A (en) * 2021-12-20 2022-03-25 北京瑞莱智慧科技有限公司 Method, apparatus, medium, and computing device for analyzing abnormality diagnostic information
CN114997276A (en) * 2022-05-07 2022-09-02 北京航空航天大学 Heterogeneous multi-source time sequence data abnormity identification method for compression molding equipment
CN116415688A (en) * 2023-03-27 2023-07-11 中国科学院空间应用工程与技术中心 Online learning method and system for fluid loop state monitoring baseline model
CN117221241A (en) * 2023-11-08 2023-12-12 杭州鸿世电器股份有限公司 Intelligent switch control process data transmission method and system
CN117807545A (en) * 2024-02-28 2024-04-02 广东优信无限网络股份有限公司 Abnormality detection method and system based on data mining
CN117877736A (en) * 2024-03-12 2024-04-12 深圳市魔样科技有限公司 Intelligent ring abnormal health data early warning method based on machine learning
CN117807545B (en) * 2024-02-28 2024-05-31 广东优信无限网络股份有限公司 Abnormality detection method and system based on data mining

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
US20120134532A1 (en) * 2010-06-08 2012-05-31 Gorilla Technology Inc. Abnormal behavior detection system and method using automatic classification of multiple features
CN104156438A (en) * 2014-08-12 2014-11-19 德州学院 Unlabeled sample selection method based on confidence coefficients and clustering
CN105915555A (en) * 2016-06-29 2016-08-31 北京奇虎科技有限公司 Method and system for detecting network anomalous behavior
US20170104773A1 (en) * 2015-10-08 2017-04-13 Cisco Technology, Inc. Cold start mechanism to prevent compromise of automatic anomaly detection systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
US20120134532A1 (en) * 2010-06-08 2012-05-31 Gorilla Technology Inc. Abnormal behavior detection system and method using automatic classification of multiple features
CN104156438A (en) * 2014-08-12 2014-11-19 德州学院 Unlabeled sample selection method based on confidence coefficients and clustering
US20170104773A1 (en) * 2015-10-08 2017-04-13 Cisco Technology, Inc. Cold start mechanism to prevent compromise of automatic anomaly detection systems
CN105915555A (en) * 2016-06-29 2016-08-31 北京奇虎科技有限公司 Method and system for detecting network anomalous behavior

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111352971A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Bank system monitoring data anomaly detection method and system
CN111400157A (en) * 2020-03-23 2020-07-10 北京亿赛通科技发展有限责任公司 System for automatically detecting computer user risk behaviors
CN111966515A (en) * 2020-07-16 2020-11-20 招联消费金融有限公司 Business abnormal data processing method and device, computer equipment and storage medium
CN111814908A (en) * 2020-07-30 2020-10-23 浪潮通用软件有限公司 Abnormal data detection model updating method and device based on data flow
CN111814908B (en) * 2020-07-30 2023-06-27 浪潮通用软件有限公司 Abnormal data detection model updating method and device based on data flow
CN112070225B (en) * 2020-09-01 2023-10-10 多点(深圳)数字科技有限公司 Entity card abnormal binding alarm method based on unsupervised learning
CN112070225A (en) * 2020-09-01 2020-12-11 多点(深圳)数字科技有限公司 Entity card abnormal binding alarm method based on unsupervised learning
CN112488507A (en) * 2020-11-30 2021-03-12 广东电网有限责任公司 Expert classification portrait method and device based on clustering and storage medium
CN112966259A (en) * 2021-03-03 2021-06-15 北京科东电力控制系统有限责任公司 Power monitoring system operation and maintenance behavior security threat assessment method and equipment
CN113591909A (en) * 2021-06-23 2021-11-02 北京智芯微电子科技有限公司 Abnormality detection method, abnormality detection device, and storage medium for power system
CN113409025B (en) * 2021-07-06 2024-03-26 中国工商银行股份有限公司 Service data extraction method, device and storage medium
CN113409025A (en) * 2021-07-06 2021-09-17 中国工商银行股份有限公司 Service data extraction method, device and storage medium
CN113521750B (en) * 2021-07-15 2023-10-24 珠海金山数字网络科技有限公司 Abnormal account detection model training method and abnormal account detection method
CN113521750A (en) * 2021-07-15 2021-10-22 珠海金山网络游戏科技有限公司 Abnormal account detection model training method and abnormal account detection method
CN113630419A (en) * 2021-08-16 2021-11-09 中移互联网有限公司 Data classification and data safety monitoring method and system based on API flow
CN113656254A (en) * 2021-08-25 2021-11-16 上海明略人工智能(集团)有限公司 Abnormity detection method and system based on log information and computer equipment
CN113779568A (en) * 2021-09-18 2021-12-10 中国平安人寿保险股份有限公司 Abnormal behavior user identification method, device, equipment and storage medium
CN113923037A (en) * 2021-10-18 2022-01-11 北京八分量信息科技有限公司 Credible computing-based anomaly detection optimization device, method and system
CN113923037B (en) * 2021-10-18 2024-03-26 北京八分量信息科技有限公司 Anomaly detection optimization device, method and system based on trusted computing
CN114050937A (en) * 2021-11-18 2022-02-15 北京天融信网络安全技术有限公司 Processing method and device for mailbox service unavailability, electronic equipment and storage medium
CN114050937B (en) * 2021-11-18 2024-02-09 天融信雄安网络安全技术有限公司 Mailbox service unavailability processing method and device, electronic equipment and storage medium
CN114239855A (en) * 2021-12-20 2022-03-25 北京瑞莱智慧科技有限公司 Method, apparatus, medium, and computing device for analyzing abnormality diagnostic information
CN114239855B (en) * 2021-12-20 2023-08-04 北京瑞莱智慧科技有限公司 Method, device, medium and computing equipment for analyzing abnormality diagnosis information
CN114997276A (en) * 2022-05-07 2022-09-02 北京航空航天大学 Heterogeneous multi-source time sequence data abnormity identification method for compression molding equipment
CN114997276B (en) * 2022-05-07 2024-05-28 北京航空航天大学 Heterogeneous multi-source time sequence data anomaly identification method for compression molding equipment
CN116415688B (en) * 2023-03-27 2023-11-03 中国科学院空间应用工程与技术中心 Online learning method and system for fluid loop state monitoring baseline model
CN116415688A (en) * 2023-03-27 2023-07-11 中国科学院空间应用工程与技术中心 Online learning method and system for fluid loop state monitoring baseline model
CN117221241B (en) * 2023-11-08 2024-01-26 杭州鸿世电器股份有限公司 Intelligent switch control process data transmission method and system
CN117221241A (en) * 2023-11-08 2023-12-12 杭州鸿世电器股份有限公司 Intelligent switch control process data transmission method and system
CN117807545A (en) * 2024-02-28 2024-04-02 广东优信无限网络股份有限公司 Abnormality detection method and system based on data mining
CN117807545B (en) * 2024-02-28 2024-05-31 广东优信无限网络股份有限公司 Abnormality detection method and system based on data mining
CN117877736A (en) * 2024-03-12 2024-04-12 深圳市魔样科技有限公司 Intelligent ring abnormal health data early warning method based on machine learning
CN117877736B (en) * 2024-03-12 2024-05-24 深圳市魔样科技股份有限公司 Intelligent ring abnormal health data early warning method based on machine learning

Similar Documents

Publication Publication Date Title
WO2020038353A1 (en) Abnormal behavior detection method and system
WO2020119662A1 (en) Network traffic classification method
US20220368703A1 (en) Method and device for detecting security based on machine learning in combination with rule matching
Fu et al. Service usage classification with encrypted internet traffic in mobile messaging apps
US8868474B2 (en) Anomaly detection for cloud monitoring
Erman et al. Semi-supervised network traffic classification
CN109309630A (en) A kind of net flow assorted method, system and electronic equipment
CN113645232B (en) Intelligent flow monitoring method, system and storage medium for industrial Internet
US20090210364A1 (en) Apparatus for and Method of Generating Complex Event Processing System Rules
US20110093785A1 (en) Apparatus for network traffic classification benchmark
CN103761173A (en) Log based computer system fault diagnosis method and device
CN109525508B (en) Encrypted stream identification method and device based on flow similarity comparison and storage medium
CN110175158A (en) A kind of log template extraction method and system based on vectorization
CN117081858B (en) Intrusion behavior detection method, system, equipment and medium based on multi-decision tree
CN113556358A (en) Abnormal flow data detection method, device, equipment and storage medium
RU148692U1 (en) COMPUTER SECURITY EVENTS MONITORING SYSTEM
RU180789U1 (en) DEVICE OF INFORMATION SECURITY AUDIT IN AUTOMATED SYSTEMS
Bailis et al. Macrobase: Analytic monitoring for the internet of things
Xiao et al. Operation and maintenance (O&M) for data center: An intelligent anomaly detection approach
WO2022047659A1 (en) Multi-source heterogeneous log analysis method
CN111984515B (en) Multi-source heterogeneous log analysis method
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
CN114666273A (en) Application layer unknown network protocol oriented traffic classification method
CN109614893B (en) Intelligent abnormal behavior track identification method and device based on situation reasoning
US20230344842A1 (en) Detection of user anomalies for software as a service application traffic with high and low variance feature modeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19851166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19851166

Country of ref document: EP

Kind code of ref document: A1