CN110633569A - Hidden Markov model-based user behavior and entity behavior analysis method - Google Patents
Hidden Markov model-based user behavior and entity behavior analysis method Download PDFInfo
- Publication number
- CN110633569A CN110633569A CN201910922253.7A CN201910922253A CN110633569A CN 110633569 A CN110633569 A CN 110633569A CN 201910922253 A CN201910922253 A CN 201910922253A CN 110633569 A CN110633569 A CN 110633569A
- Authority
- CN
- China
- Prior art keywords
- entity
- user
- behavior
- probability matrix
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 21
- 238000004519 manufacturing process Methods 0.000 claims abstract description 10
- 230000007704 transition Effects 0.000 claims abstract description 7
- 238000005096 rolling process Methods 0.000 claims abstract description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 abstract 5
- 238000013480 data collection Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Computer Hardware Design (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Alarm Systems (AREA)
- Emergency Alarm Devices (AREA)
Abstract
The invention discloses a hidden Markov model-based user behavior and entity behavior analysis method, which comprises the following steps: s1) collecting multi-party source data as training data; s2) the heterogeneous data is normalized by using One-hot coding; s3) carrying out increasing numbering on the normalized result; s4), setting a time window variable T, and rolling and dividing a behavior sequence of a user or an entity according to the time window; s5) giving training initial parameters to obtain a transition probability matrix and an emission probability matrix of user or entity behaviors; s6) placing the HMM model onto the real-time production environment; s7) collecting multiple data sources in real time and obtaining corresponding observation variables; s8) taking a user or entity behavior sequence with the variable T length of the time window, and predicting by using an HMM model to obtain the emission probability. The method can solve the problem that the traditional single-dimensional safety baseline and the artificially set threshold have deviation.
Description
Technical Field
The invention relates to a user behavior and entity behavior analysis method, in particular to a hidden Markov model-based user behavior and entity behavior analysis method.
Background
With the popularization of office networking equipment and cloud services, security logs generated daily have grown explosively. To observe the overall safety situation from a macroscopic perspective, a safety baseline is often used for representation. Conventional techniques use cumulative values or homonymous and cyclic ratios to establish a macro-angle safety baseline. If the current safety condition is lower than the threshold value of the safety baseline, a safety alarm is sent out; however, the relevance of multi-party data cannot be displayed by using the safety baseline dimension, so that a threshold value can be set artificially only for data with a single dimension, the data is expressed in a single dimension, and the definition of the threshold value also has artificial deviation.
Disclosure of Invention
The invention aims to provide a hidden Markov model-based user behavior and entity behavior analysis method, which can solve the problem that the traditional single-dimensional security baseline and the artificially set threshold have deviation.
The technical scheme adopted by the invention for solving the technical problems is to provide a hidden Markov model-based user behavior and entity behavior analysis method, which comprises the following steps: s1) collecting multi-party source data as training data; s2) the heterogeneous data is normalized by using One-hot coding; s3) carrying out incremental numbering on the result after normalization of each user group or entity type, and representing the observation variable of the HMM; s4), setting a time window variable T, and rolling and dividing a behavior sequence of a user or an entity according to the time window; s5) giving training initial parameters including an initial transition probability matrix A, an initial emission probability matrix B, a hidden variable quantity S and an initial state probability matrix pi, and carrying out HMM modeling to obtain a transition probability matrix and an emission probability matrix of a user or entity behavior; s6) placing the HMM model onto the real-time production environment; s7) collecting multiple data sources in real time, and obtaining corresponding observation variables for each piece of data according to the previously obtained One-hot coding table; s8) taking a user or entity behavior sequence of the time window variable T length, using an HMM model to predict a hidden state and a corresponding emission probability matrix to obtain an emission probability, and if the emission probability is lower than a set threshold, sending a safety alarm.
Compared with the prior art, the invention has the following beneficial effects: according to the hidden Markov model-based user behavior and entity behavior analysis method, a large amount of historical data is collected to describe a safety baseline, whether the user behavior or the entity behavior is a malicious behavior is judged according to the real-time behavior of the user or the entity, the historical data and the real-time data are effectively utilized, and errors of artificially defined threshold values are eliminated.
Drawings
FIG. 1 is a block diagram illustrating user behavior and entity behavior analysis in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention during a user behavior and entity behavior analysis phase.
Detailed Description
The invention is further described below with reference to the figures and examples.
Referring to fig. 1, the hidden markov model-based user behavior and entity behavior analysis method provided by the present invention includes a data collection module, an HMM modeling module, a user or entity behavior analysis module, and a security alarm module.
And the data collection module is used for collecting a plurality of data sources and cleaning and converting the data.
The HMM modeling module is used for normalizing historical data by using One-hot codes, and incrementally numbering the normalization result to represent an observation variable; and establishing an HMM model according to the specified initialization parameters.
And the user or entity behavior analysis module is used for coding the real-time data from the data collection module by using an One-hot coding table and judging the behavior of the user or entity by using an HMM (hidden Markov model) model for analysis.
And the safety alarm module monitors the user or entity behavior analysis module, and if the user or entity behavior analysis module finds the malicious behavior of the user or entity, the safety alarm module carries out safety alarm in real time.
The invention provides a hidden Markov chain model (HMM) based user behavior and entity behavior analysis (UEBA) method, which has the following characteristics:
1) the security baseline is obtained from a large amount of historical data of the user or entity.
2) Safety alarm criteria are not artificially defined, but rather, a user or entity's historical data is used to train the resulting HMM model for decision making.
3) Logs are gathered from the multi-party data sources in real-time and security alarms are thereby refined.
With continued reference to fig. 2, a flow chart of a hidden markov chain model (HMM) based user behavior and entity behavior analysis (UEBA) method of the present invention is as follows:
step 1: and collecting data of multiple sources as training data, wherein the data sources comprise a host log, a bastion machine log, a DLP log and the like.
Step 2: the heterogeneous data is normalized by using One-hot codes, for example, whether a code field is at work time, whether a file is downloaded or not, whether the file is uploaded or not, whether a source host for downloading the file is a production host, whether a destination computer for uploading the file is the production host, whether the file is uploaded or not, whether the source host is the production host or not, whether a destination computer for uploading the file is the production host or not, whether the source host is the high-risk operation or not, whether the destination computer is the medium-risk operation or not, and whether the destination computer is the low-risk operation or not, and the One-hot code
And step 3: and carrying out incremental numbering on the normalized result of each user group or entity type to represent the observation variable of the HMM.
And 4, step 4: setting a time window variable T to be 3; setting the hidden state quantity S as 10; the initial probability matrix pi uses the initial probability of each hidden variable as 1/S; the initial transition probability matrix A and the initial transmission probability matrix B are generated by using random numbers.
And 5: and performing HMM modeling to obtain a transition probability matrix and an emission probability matrix of the user or entity behavior.
Step 6: and collecting a plurality of data sources on line in real time.
And 7: and (3) taking a behavior sequence with the time window length T of a specific user, and coding each datum according to the previously obtained One-hot to obtain a corresponding observation variable, wherein the variable of the behavior observation variable sequence of the user is 10,50 and 51.
And 7: predicting the current hidden state by using an HMM model, inquiring an emission probability matrix, and judging whether to give out a safety alarm or not according to a set threshold; for example, if the hidden variable at the current time predicted by the HMM model is 3, and the probability that the hidden variable 3 in the emission probability matrix corresponds to the occurrence of the observed variable 10 is 0.03% and is lower than the threshold value 0.1%, a safety alarm is issued.
The invention has the following advantages: collecting multi-party data sources as training data, and training a model by using a hidden Markov chain (HMM) model to eliminate safety baseline errors caused by human defined deviation; the model is used for generating a user or entity behavior baseline, the user or entity behavior abnormity detection automatic discovery and safety alarm are achieved through real-time data and a safety baseline, and safety events are effectively refined from a large amount of data and safety alarm is conducted.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (3)
1. A hidden Markov model-based user behavior and entity behavior analysis method is characterized by comprising the following steps:
s1) collecting multi-party source data as training data;
s2) the heterogeneous data is normalized by using One-hot coding;
s3) carrying out incremental numbering on the result after normalization of each user group or entity type, and representing the observation variable of the HMM;
s4), setting a time window variable T, and rolling and dividing a behavior sequence of a user or an entity according to the time window;
s5) giving training initial parameters including an initial transition probability matrix A, an initial emission probability matrix B, a hidden variable quantity S and an initial state probability matrix pi, and carrying out HMM modeling to obtain a transition probability matrix and an emission probability matrix of a user or entity behavior;
s6) placing the HMM model onto the real-time production environment;
s7) collecting multiple data sources in real time, and obtaining corresponding observation variables for each piece of data according to the previously obtained One-hot coding table;
s8) taking a user or entity behavior sequence of the time window variable T length, using an HMM model to predict a hidden state and a corresponding emission probability matrix to obtain an emission probability, and if the emission probability is lower than a set threshold, sending a safety alarm.
2. The hidden markov model-based user behavior and entity behavior analysis method of claim 1, wherein the multi-party source data comprises a host log, a bastion log and a DLP log.
3. The hidden markov model based user behavior and entity behavior analysis method of claim 1, wherein the step S2 uses the following One-hot encoded fields: whether the operation is in working hours, whether the file is downloaded or not, whether the file is uploaded or not, whether a source host for downloading the file is a production host or not, whether a destination computer for uploading the file is the production host or not, whether the file is uploaded or not, whether the destination computer is the production host or not, whether the destination computer is a high-risk operation or not, whether the destination computer is a medium-risk operation or not, and whether the destination computer is a low-risk operation or not.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910922253.7A CN110633569A (en) | 2019-09-27 | 2019-09-27 | Hidden Markov model-based user behavior and entity behavior analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910922253.7A CN110633569A (en) | 2019-09-27 | 2019-09-27 | Hidden Markov model-based user behavior and entity behavior analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110633569A true CN110633569A (en) | 2019-12-31 |
Family
ID=68973229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910922253.7A Pending CN110633569A (en) | 2019-09-27 | 2019-09-27 | Hidden Markov model-based user behavior and entity behavior analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110633569A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733015A (en) * | 2020-12-30 | 2021-04-30 | 绿盟科技集团股份有限公司 | User behavior analysis method, device, equipment and medium |
WO2022047659A1 (en) * | 2020-09-02 | 2022-03-10 | 大连大学 | Multi-source heterogeneous log analysis method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101615186A (en) * | 2009-07-28 | 2009-12-30 | 东北大学 | A kind of BBS user's abnormal behaviour auditing method based on Hidden Markov theory |
CN104699606A (en) * | 2015-03-06 | 2015-06-10 | 国网四川省电力公司电力科学研究院 | Method for predicting state of software system based on hidden Markov model |
CN106682503A (en) * | 2017-01-06 | 2017-05-17 | 浙江中都信息技术有限公司 | Application of genetic algorithm based hidden Markov model to mainframe risk assessment |
CN107402921A (en) * | 2016-05-18 | 2017-11-28 | 阿里巴巴集团控股有限公司 | Identify event-order serie data processing method, the apparatus and system of user behavior |
CN107944643A (en) * | 2017-12-22 | 2018-04-20 | 上海斐讯数据通信技术有限公司 | A kind of purchase conversion ratio Forecasting Methodology and system based on Hidden Markov Model |
-
2019
- 2019-09-27 CN CN201910922253.7A patent/CN110633569A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101615186A (en) * | 2009-07-28 | 2009-12-30 | 东北大学 | A kind of BBS user's abnormal behaviour auditing method based on Hidden Markov theory |
CN104699606A (en) * | 2015-03-06 | 2015-06-10 | 国网四川省电力公司电力科学研究院 | Method for predicting state of software system based on hidden Markov model |
CN107402921A (en) * | 2016-05-18 | 2017-11-28 | 阿里巴巴集团控股有限公司 | Identify event-order serie data processing method, the apparatus and system of user behavior |
CN106682503A (en) * | 2017-01-06 | 2017-05-17 | 浙江中都信息技术有限公司 | Application of genetic algorithm based hidden Markov model to mainframe risk assessment |
CN107944643A (en) * | 2017-12-22 | 2018-04-20 | 上海斐讯数据通信技术有限公司 | A kind of purchase conversion ratio Forecasting Methodology and system based on Hidden Markov Model |
Non-Patent Citations (2)
Title |
---|
睢丹等: "基于隐马尔可夫的系统入侵检测方法", 《微计算机信息》 * |
黄建强: "基于HMM的数据库异常检测方法", 《计算机安全》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022047659A1 (en) * | 2020-09-02 | 2022-03-10 | 大连大学 | Multi-source heterogeneous log analysis method |
CN112733015A (en) * | 2020-12-30 | 2021-04-30 | 绿盟科技集团股份有限公司 | User behavior analysis method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110766192B (en) | Drilling well leakage prediction system and method based on deep learning | |
CN111798312B (en) | Financial transaction system anomaly identification method based on isolated forest algorithm | |
CN111475804A (en) | Alarm prediction method and system | |
CN106375339A (en) | Attack mode detection method based on event slide window | |
WO2003063032A1 (en) | Performance monitoring system and method | |
CN110119787B (en) | Working condition detection method and equipment for rotary mechanical equipment | |
CN104503434B (en) | Fault diagnosis method based on active fault symptom pushing | |
CN110633569A (en) | Hidden Markov model-based user behavior and entity behavior analysis method | |
CN110334105B (en) | Stream data abnormity detection method based on Storm | |
CN116827350B (en) | Flexible work platform intelligent supervision method and system based on cloud edge cooperation | |
CN110636066A (en) | Network security threat situation assessment method based on unsupervised generative reasoning | |
CN113239042B (en) | Method for storing underground structure state information by block chain | |
CN112101969A (en) | Environmental protection data false-making detection method based on time sequence sliding window discrete coefficient | |
CN109635008B (en) | Equipment fault detection method based on machine learning | |
CN116881535A (en) | Public opinion comprehensive supervision system with timely early warning function | |
CN115987692A (en) | Safety protection system and method based on flow backtracking analysis | |
CN116560946A (en) | Soil pollution data pushing system based on cloud computing | |
CN116467592A (en) | Production equipment fault intelligent monitoring method and system based on deep learning | |
CN115713044A (en) | Method and device for analyzing residual service life of electromechanical equipment under multi-working-condition switching | |
CN115186935A (en) | Electromechanical device nonlinear fault prediction method and system | |
CN109657404B (en) | Automatic fault diagnosis system for coal mining machine based on chaos correction group intelligent optimization | |
CN113328986A (en) | Network flow abnormity detection method based on combination of convolutional neural network and LSTM | |
CN117078232B (en) | Processing equipment fault prevention system and method based on big data | |
CN117273670B (en) | Engineering data management system with learning function | |
CN109977021A (en) | A kind of software quality management method and system based on Association Rule Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191231 |
|
RJ01 | Rejection of invention patent application after publication |