CN110633569A - Hidden Markov model-based user behavior and entity behavior analysis method - Google Patents

Hidden Markov model-based user behavior and entity behavior analysis method Download PDF

Info

Publication number
CN110633569A
CN110633569A CN201910922253.7A CN201910922253A CN110633569A CN 110633569 A CN110633569 A CN 110633569A CN 201910922253 A CN201910922253 A CN 201910922253A CN 110633569 A CN110633569 A CN 110633569A
Authority
CN
China
Prior art keywords
entity
user
behavior
probability matrix
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910922253.7A
Other languages
Chinese (zh)
Inventor
唐誌欣
黄宗纬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Secco Travel Technology Service Co Ltd
Original Assignee
Shanghai Secco Travel Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Secco Travel Technology Service Co Ltd filed Critical Shanghai Secco Travel Technology Service Co Ltd
Priority to CN201910922253.7A priority Critical patent/CN110633569A/en
Publication of CN110633569A publication Critical patent/CN110633569A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Computer Hardware Design (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Alarm Systems (AREA)
  • Emergency Alarm Devices (AREA)

Abstract

The invention discloses a hidden Markov model-based user behavior and entity behavior analysis method, which comprises the following steps: s1) collecting multi-party source data as training data; s2) the heterogeneous data is normalized by using One-hot coding; s3) carrying out increasing numbering on the normalized result; s4), setting a time window variable T, and rolling and dividing a behavior sequence of a user or an entity according to the time window; s5) giving training initial parameters to obtain a transition probability matrix and an emission probability matrix of user or entity behaviors; s6) placing the HMM model onto the real-time production environment; s7) collecting multiple data sources in real time and obtaining corresponding observation variables; s8) taking a user or entity behavior sequence with the variable T length of the time window, and predicting by using an HMM model to obtain the emission probability. The method can solve the problem that the traditional single-dimensional safety baseline and the artificially set threshold have deviation.

Description

Hidden Markov model-based user behavior and entity behavior analysis method
Technical Field
The invention relates to a user behavior and entity behavior analysis method, in particular to a hidden Markov model-based user behavior and entity behavior analysis method.
Background
With the popularization of office networking equipment and cloud services, security logs generated daily have grown explosively. To observe the overall safety situation from a macroscopic perspective, a safety baseline is often used for representation. Conventional techniques use cumulative values or homonymous and cyclic ratios to establish a macro-angle safety baseline. If the current safety condition is lower than the threshold value of the safety baseline, a safety alarm is sent out; however, the relevance of multi-party data cannot be displayed by using the safety baseline dimension, so that a threshold value can be set artificially only for data with a single dimension, the data is expressed in a single dimension, and the definition of the threshold value also has artificial deviation.
Disclosure of Invention
The invention aims to provide a hidden Markov model-based user behavior and entity behavior analysis method, which can solve the problem that the traditional single-dimensional security baseline and the artificially set threshold have deviation.
The technical scheme adopted by the invention for solving the technical problems is to provide a hidden Markov model-based user behavior and entity behavior analysis method, which comprises the following steps: s1) collecting multi-party source data as training data; s2) the heterogeneous data is normalized by using One-hot coding; s3) carrying out incremental numbering on the result after normalization of each user group or entity type, and representing the observation variable of the HMM; s4), setting a time window variable T, and rolling and dividing a behavior sequence of a user or an entity according to the time window; s5) giving training initial parameters including an initial transition probability matrix A, an initial emission probability matrix B, a hidden variable quantity S and an initial state probability matrix pi, and carrying out HMM modeling to obtain a transition probability matrix and an emission probability matrix of a user or entity behavior; s6) placing the HMM model onto the real-time production environment; s7) collecting multiple data sources in real time, and obtaining corresponding observation variables for each piece of data according to the previously obtained One-hot coding table; s8) taking a user or entity behavior sequence of the time window variable T length, using an HMM model to predict a hidden state and a corresponding emission probability matrix to obtain an emission probability, and if the emission probability is lower than a set threshold, sending a safety alarm.
Compared with the prior art, the invention has the following beneficial effects: according to the hidden Markov model-based user behavior and entity behavior analysis method, a large amount of historical data is collected to describe a safety baseline, whether the user behavior or the entity behavior is a malicious behavior is judged according to the real-time behavior of the user or the entity, the historical data and the real-time data are effectively utilized, and errors of artificially defined threshold values are eliminated.
Drawings
FIG. 1 is a block diagram illustrating user behavior and entity behavior analysis in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention during a user behavior and entity behavior analysis phase.
Detailed Description
The invention is further described below with reference to the figures and examples.
Referring to fig. 1, the hidden markov model-based user behavior and entity behavior analysis method provided by the present invention includes a data collection module, an HMM modeling module, a user or entity behavior analysis module, and a security alarm module.
And the data collection module is used for collecting a plurality of data sources and cleaning and converting the data.
The HMM modeling module is used for normalizing historical data by using One-hot codes, and incrementally numbering the normalization result to represent an observation variable; and establishing an HMM model according to the specified initialization parameters.
And the user or entity behavior analysis module is used for coding the real-time data from the data collection module by using an One-hot coding table and judging the behavior of the user or entity by using an HMM (hidden Markov model) model for analysis.
And the safety alarm module monitors the user or entity behavior analysis module, and if the user or entity behavior analysis module finds the malicious behavior of the user or entity, the safety alarm module carries out safety alarm in real time.
The invention provides a hidden Markov chain model (HMM) based user behavior and entity behavior analysis (UEBA) method, which has the following characteristics:
1) the security baseline is obtained from a large amount of historical data of the user or entity.
2) Safety alarm criteria are not artificially defined, but rather, a user or entity's historical data is used to train the resulting HMM model for decision making.
3) Logs are gathered from the multi-party data sources in real-time and security alarms are thereby refined.
With continued reference to fig. 2, a flow chart of a hidden markov chain model (HMM) based user behavior and entity behavior analysis (UEBA) method of the present invention is as follows:
step 1: and collecting data of multiple sources as training data, wherein the data sources comprise a host log, a bastion machine log, a DLP log and the like.
Step 2: the heterogeneous data is normalized by using One-hot codes, for example, whether a code field is at work time, whether a file is downloaded or not, whether the file is uploaded or not, whether a source host for downloading the file is a production host, whether a destination computer for uploading the file is the production host, whether the file is uploaded or not, whether the source host is the production host or not, whether a destination computer for uploading the file is the production host or not, whether the source host is the high-risk operation or not, whether the destination computer is the medium-risk operation or not, and whether the destination computer is the low-risk operation or not, and the One-hot code
And step 3: and carrying out incremental numbering on the normalized result of each user group or entity type to represent the observation variable of the HMM.
And 4, step 4: setting a time window variable T to be 3; setting the hidden state quantity S as 10; the initial probability matrix pi uses the initial probability of each hidden variable as 1/S; the initial transition probability matrix A and the initial transmission probability matrix B are generated by using random numbers.
And 5: and performing HMM modeling to obtain a transition probability matrix and an emission probability matrix of the user or entity behavior.
Step 6: and collecting a plurality of data sources on line in real time.
And 7: and (3) taking a behavior sequence with the time window length T of a specific user, and coding each datum according to the previously obtained One-hot to obtain a corresponding observation variable, wherein the variable of the behavior observation variable sequence of the user is 10,50 and 51.
And 7: predicting the current hidden state by using an HMM model, inquiring an emission probability matrix, and judging whether to give out a safety alarm or not according to a set threshold; for example, if the hidden variable at the current time predicted by the HMM model is 3, and the probability that the hidden variable 3 in the emission probability matrix corresponds to the occurrence of the observed variable 10 is 0.03% and is lower than the threshold value 0.1%, a safety alarm is issued.
The invention has the following advantages: collecting multi-party data sources as training data, and training a model by using a hidden Markov chain (HMM) model to eliminate safety baseline errors caused by human defined deviation; the model is used for generating a user or entity behavior baseline, the user or entity behavior abnormity detection automatic discovery and safety alarm are achieved through real-time data and a safety baseline, and safety events are effectively refined from a large amount of data and safety alarm is conducted.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (3)

1. A hidden Markov model-based user behavior and entity behavior analysis method is characterized by comprising the following steps:
s1) collecting multi-party source data as training data;
s2) the heterogeneous data is normalized by using One-hot coding;
s3) carrying out incremental numbering on the result after normalization of each user group or entity type, and representing the observation variable of the HMM;
s4), setting a time window variable T, and rolling and dividing a behavior sequence of a user or an entity according to the time window;
s5) giving training initial parameters including an initial transition probability matrix A, an initial emission probability matrix B, a hidden variable quantity S and an initial state probability matrix pi, and carrying out HMM modeling to obtain a transition probability matrix and an emission probability matrix of a user or entity behavior;
s6) placing the HMM model onto the real-time production environment;
s7) collecting multiple data sources in real time, and obtaining corresponding observation variables for each piece of data according to the previously obtained One-hot coding table;
s8) taking a user or entity behavior sequence of the time window variable T length, using an HMM model to predict a hidden state and a corresponding emission probability matrix to obtain an emission probability, and if the emission probability is lower than a set threshold, sending a safety alarm.
2. The hidden markov model-based user behavior and entity behavior analysis method of claim 1, wherein the multi-party source data comprises a host log, a bastion log and a DLP log.
3. The hidden markov model based user behavior and entity behavior analysis method of claim 1, wherein the step S2 uses the following One-hot encoded fields: whether the operation is in working hours, whether the file is downloaded or not, whether the file is uploaded or not, whether a source host for downloading the file is a production host or not, whether a destination computer for uploading the file is the production host or not, whether the file is uploaded or not, whether the destination computer is the production host or not, whether the destination computer is a high-risk operation or not, whether the destination computer is a medium-risk operation or not, and whether the destination computer is a low-risk operation or not.
CN201910922253.7A 2019-09-27 2019-09-27 Hidden Markov model-based user behavior and entity behavior analysis method Pending CN110633569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910922253.7A CN110633569A (en) 2019-09-27 2019-09-27 Hidden Markov model-based user behavior and entity behavior analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910922253.7A CN110633569A (en) 2019-09-27 2019-09-27 Hidden Markov model-based user behavior and entity behavior analysis method

Publications (1)

Publication Number Publication Date
CN110633569A true CN110633569A (en) 2019-12-31

Family

ID=68973229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910922253.7A Pending CN110633569A (en) 2019-09-27 2019-09-27 Hidden Markov model-based user behavior and entity behavior analysis method

Country Status (1)

Country Link
CN (1) CN110633569A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733015A (en) * 2020-12-30 2021-04-30 绿盟科技集团股份有限公司 User behavior analysis method, device, equipment and medium
WO2022047659A1 (en) * 2020-09-02 2022-03-10 大连大学 Multi-source heterogeneous log analysis method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615186A (en) * 2009-07-28 2009-12-30 东北大学 A kind of BBS user's abnormal behaviour auditing method based on Hidden Markov theory
CN104699606A (en) * 2015-03-06 2015-06-10 国网四川省电力公司电力科学研究院 Method for predicting state of software system based on hidden Markov model
CN106682503A (en) * 2017-01-06 2017-05-17 浙江中都信息技术有限公司 Application of genetic algorithm based hidden Markov model to mainframe risk assessment
CN107402921A (en) * 2016-05-18 2017-11-28 阿里巴巴集团控股有限公司 Identify event-order serie data processing method, the apparatus and system of user behavior
CN107944643A (en) * 2017-12-22 2018-04-20 上海斐讯数据通信技术有限公司 A kind of purchase conversion ratio Forecasting Methodology and system based on Hidden Markov Model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615186A (en) * 2009-07-28 2009-12-30 东北大学 A kind of BBS user's abnormal behaviour auditing method based on Hidden Markov theory
CN104699606A (en) * 2015-03-06 2015-06-10 国网四川省电力公司电力科学研究院 Method for predicting state of software system based on hidden Markov model
CN107402921A (en) * 2016-05-18 2017-11-28 阿里巴巴集团控股有限公司 Identify event-order serie data processing method, the apparatus and system of user behavior
CN106682503A (en) * 2017-01-06 2017-05-17 浙江中都信息技术有限公司 Application of genetic algorithm based hidden Markov model to mainframe risk assessment
CN107944643A (en) * 2017-12-22 2018-04-20 上海斐讯数据通信技术有限公司 A kind of purchase conversion ratio Forecasting Methodology and system based on Hidden Markov Model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
睢丹等: "基于隐马尔可夫的系统入侵检测方法", 《微计算机信息》 *
黄建强: "基于HMM的数据库异常检测方法", 《计算机安全》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022047659A1 (en) * 2020-09-02 2022-03-10 大连大学 Multi-source heterogeneous log analysis method
CN112733015A (en) * 2020-12-30 2021-04-30 绿盟科技集团股份有限公司 User behavior analysis method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN110766192B (en) Drilling well leakage prediction system and method based on deep learning
CN111798312B (en) Financial transaction system anomaly identification method based on isolated forest algorithm
CN111475804A (en) Alarm prediction method and system
CN106375339A (en) Attack mode detection method based on event slide window
WO2003063032A1 (en) Performance monitoring system and method
CN110119787B (en) Working condition detection method and equipment for rotary mechanical equipment
CN104503434B (en) Fault diagnosis method based on active fault symptom pushing
CN110633569A (en) Hidden Markov model-based user behavior and entity behavior analysis method
CN110334105B (en) Stream data abnormity detection method based on Storm
CN116827350B (en) Flexible work platform intelligent supervision method and system based on cloud edge cooperation
CN110636066A (en) Network security threat situation assessment method based on unsupervised generative reasoning
CN113239042B (en) Method for storing underground structure state information by block chain
CN112101969A (en) Environmental protection data false-making detection method based on time sequence sliding window discrete coefficient
CN109635008B (en) Equipment fault detection method based on machine learning
CN116881535A (en) Public opinion comprehensive supervision system with timely early warning function
CN115987692A (en) Safety protection system and method based on flow backtracking analysis
CN116560946A (en) Soil pollution data pushing system based on cloud computing
CN116467592A (en) Production equipment fault intelligent monitoring method and system based on deep learning
CN115713044A (en) Method and device for analyzing residual service life of electromechanical equipment under multi-working-condition switching
CN115186935A (en) Electromechanical device nonlinear fault prediction method and system
CN109657404B (en) Automatic fault diagnosis system for coal mining machine based on chaos correction group intelligent optimization
CN113328986A (en) Network flow abnormity detection method based on combination of convolutional neural network and LSTM
CN117078232B (en) Processing equipment fault prevention system and method based on big data
CN117273670B (en) Engineering data management system with learning function
CN109977021A (en) A kind of software quality management method and system based on Association Rule Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191231

RJ01 Rejection of invention patent application after publication