CN113971119A - Unsupervised model-based user behavior anomaly analysis and evaluation method and system - Google Patents

Unsupervised model-based user behavior anomaly analysis and evaluation method and system Download PDF

Info

Publication number
CN113971119A
CN113971119A CN202111225942.6A CN202111225942A CN113971119A CN 113971119 A CN113971119 A CN 113971119A CN 202111225942 A CN202111225942 A CN 202111225942A CN 113971119 A CN113971119 A CN 113971119A
Authority
CN
China
Prior art keywords
user
abnormal
features
behavior
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111225942.6A
Other languages
Chinese (zh)
Other versions
CN113971119B (en
Inventor
王诗涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunfen Shanghai Information Technology Co ltd
Original Assignee
Yunfen Shanghai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunfen Shanghai Information Technology Co ltd filed Critical Yunfen Shanghai Information Technology Co ltd
Priority to CN202111225942.6A priority Critical patent/CN113971119B/en
Publication of CN113971119A publication Critical patent/CN113971119A/en
Application granted granted Critical
Publication of CN113971119B publication Critical patent/CN113971119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention is suitable for the field of computer network security and artificial intelligence, and provides a user behavior abnormity analysis and evaluation method and a system based on an unsupervised model, wherein the evaluation method uses an unsupervised learning method to perform characteristic engineering on the characteristics of log data, converts category characteristics into the frequency of appearance of characteristic values in the characteristics, and removes noise characteristics and useless characteristics by comparing the abnormal degree between the characteristic values in the characteristics and the abnormal degree between the characteristics to obtain target characteristics; learning a user behavior rule for the target characteristics by using a time series prediction model to obtain abnormal behaviors of the user; based on the abnormal behaviors of the user, the statistical method is used for evaluating the user behavior risk by combining with the cloud dispute SIEM platform abnormal alarm module, so that the abnormal user behaviors can be accurately identified in log data with a large number of user behaviors, and low-efficiency rule learning and abnormal detection caused by excessive noise characteristics and meaningless characteristics are avoided.

Description

Unsupervised model-based user behavior anomaly analysis and evaluation method and system
Technical Field
The invention belongs to the technical field of computer network safety and artificial intelligence, and particularly relates to a user behavior abnormity analysis and evaluation method and system based on an unsupervised model.
Background
Internal threats have recently become a security concern for many businesses, and the losses due to internal threats are also enormous. Due to the variety of scenes and the variability of user behavior, human analysis and monitoring are almost impossible to issue safety warnings instantaneously. The machine learning method is used for learning the rule of the user behavior, so that abnormal users and behaviors can be detected, and the labor cost and the time cost can be greatly reduced. And the actual user behavior data does not have an accurate label to mark whether the user behavior is abnormal, so that an unsupervised learning method is provided for learning the user behavior.
In many unsupervised anomaly detection scenarios, some challenges are typical, such as complex inter-crossing between features, mixing of relevant features with redundant or noisy features, and extreme disparity in the amount of normal and anomalous data. In such a complex situation, the abnormal object is often determined as a normal object, especially a rule-based algorithm (patternbased). Some abnormal subjects will be considered normal because they possess high frequency regularity in the noise signature. In contrast, some normal subjects are considered abnormal because they only account for the low frequency law in the noise signature. In data with many class features, the meaningless features are many. A large amount of user behaviors are recorded in the existing log data, but the log data is huge in quantity and various in characteristics.
Disclosure of Invention
The embodiment of the invention aims to provide a user behavior abnormity analysis and evaluation method and system based on an unsupervised model, and aims to solve the problems in the background technology. In order to achieve the above purpose, the embodiment of the present invention provides the following technical solutions:
in a first aspect, in an embodiment of the present invention, an unsupervised model-based user behavior anomaly analysis and evaluation method is provided, the evaluation method including the following steps:
performing feature engineering on the log data features by using an unsupervised learning method, converting category features into the frequency of appearance of feature values in the features, and removing noise features and useless features by comparing the abnormality degree between the feature values in the features and the abnormality degree between the features to obtain target features;
learning a user behavior rule for the target characteristics by using a time series prediction model to obtain abnormal behaviors of the user;
based on the abnormal behavior of the user, and in combination with a cloud dispute SIEM platform abnormal alarm module, evaluating the user behavior risk by using a statistical method to obtain a user risk value;
and carrying out early warning based on the user risk value.
In some embodiments provided by the present invention, the step of performing feature engineering on the log data features by using an unsupervised learning method includes:
the feature engineering adopts an unsupervised feature selection CUFS framework, and feature denoising and selection are carried out on the basis of the CUFS framework;
creating two levels of coupling using the CUFS framework, the two levels of coupling comprising a characteristic numerical coupling and a characteristic coupling;
learning value coupling between the characteristic internal value and the characteristic to obtain an abnormal value of a characteristic value hierarchy, and generating a numerical value graph of which the abnormal value has edge weight;
and inputting the obtained numerical graph into feature coupling analysis to generate a feature graph of set feature value hierarchy abnormal values.
In some embodiments provided by the present invention, a holtviters exponential smoothing model is used to learn the behavior rules of the user in the construction of the time series prediction model.
In some embodiments provided by the present invention, a grid search method is used in the construction of the time series prediction model to find parameters when the holtviters exponential smoothing model is used for different user behaviors.
In some embodiments provided by the present invention, the time series prediction model is constructed by estimating the parameters using root mean square error of the real value and the predicted value.
In some embodiments provided by the present invention, in the building of the time series prediction model, a model is used to predict a next behavior value of a user and generate a confidence interval; if the true data is greater than the upper bound of the confidence interval, then the time period behavior is abnormal.
In some embodiments provided by the present invention, the step of evaluating the user behavior risk by using a statistical method based on the abnormal behavior of the user and in combination with a cloud dispute SIEM platform abnormal alarm module to obtain a user risk value includes:
calling all safety event rules in the cloud dispute SIEM platform abnormity warning module, wherein a rule set G is{g1,g2,...,gnN rules;
randomly sampling from the whole to obtain a sample set X;
the rule generation set C is obtained through statistics, wherein C is { C ═ C }1,c2,...,cn};
Obtaining fraction of rule occurrences using rule occurrence set C
Figure BDA0003314027240000031
Converting the ratio to the reciprocal
Figure BDA0003314027240000041
Calling a cloud dispute SIEM platform abnormity warning module to obtain a user risk value, wherein the user risk value is the reciprocal sum of all events multiplied by 100;
score=(sc1+sc2+...+sci)*100。
in some embodiments provided by the present invention, the step of performing early warning based on the user risk value includes: determining whether the user behavior is abnormal or not based on the user risk value, generating a monitoring panel by combining with a cloud dispute Ueba platform when the user behavior is abnormal, and sending a high-risk alarm by combining with a cloud dispute IXAlert alarm platform.
In a second aspect, in another embodiment of the present invention, an unsupervised model-based user behavior anomaly analysis and evaluation system is provided, the evaluation system comprising:
the data processing module is used for performing feature engineering on the log data features by using an unsupervised learning method, converting the category features into the frequency of occurrence of feature values in the features, and removing noise features and useless features by comparing the abnormality degree between the feature values in the features and the abnormality degree between the features to obtain target features;
the behavior acquisition module is used for learning the behavior rule of the user on the target characteristics by using a time series prediction model to obtain the abnormal behavior of the user;
the behavior evaluation module is used for evaluating the user behavior risk by using a statistical method based on the abnormal behavior of the user and combining with the cloud dispute SIEM platform abnormal alarm module to obtain a user risk value;
and the behavior early warning module is used for carrying out early warning based on the user risk value.
Compared with the prior art, the unsupervised model-based user behavior anomaly analysis and evaluation method and system provided by the embodiment of the invention have the technical advantages that the unsupervised learning method is used for performing feature engineering on the log data features, the category features are converted into the frequency of appearance of feature values in the features, and the noise features and useless features are removed by comparing the anomaly degree between the feature values in the features and the anomaly degree between the features to obtain the target features; learning a user behavior rule for the target characteristics by using a time series prediction model to obtain abnormal behaviors of the user; based on the abnormal behavior of the user, and in combination with a cloud dispute SIEM platform abnormal alarm module, evaluating the user behavior risk by using a statistical method to obtain a user risk value; and early warning is carried out based on the user risk value, so that abnormal user behaviors can be accurately identified in log data with a large number of user behaviors, and inefficient regular learning and abnormal detection caused by excessive noise characteristics and meaningless characteristics are avoided.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a flow chart illustrating an implementation of a method for analyzing and evaluating user behavior anomalies based on an unsupervised model according to an embodiment of the present invention;
FIG. 2 is a sub-flowchart of a method for analyzing and evaluating user behavior anomalies based on an unsupervised model according to an embodiment of the present invention;
FIG. 3 is a global flow chart of a method for analyzing and evaluating user behavior anomalies based on an unsupervised model according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating a system for analyzing and evaluating user behavior anomalies based on an unsupervised model according to an embodiment of the present invention;
fig. 5 shows a block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Specific implementations of the present invention are described in detail below with reference to specific embodiments.
In many unsupervised anomaly detection scenarios, some challenges are typical, such as complex inter-crossing between features, mixing of relevant features with redundant or noisy features, and extreme disparity in the amount of normal and anomalous data. In such a complex situation, the abnormal object is often determined as a normal object, especially a rule-based algorithm (patternbased). Some abnormal subjects will be considered normal because they possess high frequency regularity in the noise signature. In contrast, some normal subjects are considered abnormal because they only account for the low frequency law in the noise signature. In data with many class features, the meaningless features are many. A large amount of user behaviors are recorded in the existing log data, but the log data is huge in quantity and various in characteristics
In log data, the time and sequence of occurrence of behaviors are important characteristics of the behavior rules of users. And the user's behavior is trending and periodic. For this purpose, the method proposes to use the existing time series prediction algorithm Holtviters exponential smoothing model to learn the behavior rules of the user.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions.
Fig. 1 exemplarily shows an implementation flowchart of a user behavior anomaly analysis and evaluation method based on an unsupervised model according to an embodiment of the present invention:
as shown in fig. 1 and fig. 3, in an embodiment of the present invention, an unsupervised model-based user behavior anomaly analysis and evaluation method is provided, including the following steps:
step S101: performing feature engineering on the log data features by using an unsupervised learning method, converting category features into the frequency of appearance of feature values in the features, and removing noise features and useless features by comparing the abnormality degree between the feature values in the features and the abnormality degree between the features to obtain target features;
step S102: learning a user behavior rule for the target characteristics by using a time series prediction model, wherein the user behavior rule comprises a seasonal rule and a trend rule, and then obtaining abnormal behaviors of the user;
step S103: based on the abnormal behavior of the user, and in combination with a cloud dispute SIEM platform abnormal alarm module, evaluating the user behavior risk by using a statistical method to obtain a user risk value;
step S104: and carrying out early warning based on the user risk value.
In some embodiments provided by the present invention, the step of performing feature engineering on the log data features by using an unsupervised learning method includes:
the feature engineering adopts an unsupervised feature selection CUFS framework, and feature denoising and selection are carried out on the basis of the CUFS framework;
creating two levels of coupling using the CUFS framework, the two levels of coupling comprising a characteristic numerical coupling and a characteristic coupling;
learning value coupling between the characteristic internal value and the characteristic to obtain an abnormal value of a characteristic value hierarchy, and generating a numerical value graph of which the abnormal value has edge weight;
and inputting the obtained numerical graph into feature coupling analysis to generate a feature graph of set feature value hierarchy abnormal values.
In the embodiment of the invention, the existing unsupervised feature selection framework is adopted, the CUFS framework is used for feature denoising and selection, and the CUFS framework creates two levels of coupling, namely the coupling of feature values and the coupling of cross-features.
The CUFS takes into account the cross-over between each pair of features, solving the following three points: (1) noise, meaningless features and related features are deeply fused, making it difficult to distinguish between normal and abnormal objects; (2) excessive noise features cause the algorithm to learn wrong features, so that a large number of false positive misjudgments are generated; (3) meaningless features lead to inefficient regular learning and anomaly detection.
Further, in the embodiment of the present invention, the time series prediction model is constructed by integrating the time series data of the order of minutes and seconds into a time period of every 5 minutes.
Further, in the embodiment provided by the invention, a holtviters exponential smoothing method model is adopted to learn the behavior rule of the user in the construction of the time series prediction model.
Further, in the embodiment provided by the present invention, a grid search method is adopted in the construction of the time series prediction model to find parameters when the holtviters exponential smoothing method model is used for different user behaviors.
Further, in the embodiment provided by the present invention, in the construction of the time series prediction model, a Root Mean Square Error (RMSE) between a true value and a predicted value is used to evaluate the parameter.
Further, in the embodiment provided by the present invention, in the building of the time series prediction model, a model is used to predict a next behavior value of the user and generate a confidence interval; if the true data is greater than the upper bound of the confidence interval, then the time period behavior is abnormal.
Further, in the embodiment provided by the invention, the embodiment of the invention calls the security event in the exception alarm module of the cloud dispute SIEM platform.
Further, in the embodiment provided by the present invention, in the construction of the risk assessment system, it is assumed that the high-risk security event in the data is a low-frequency event in the overall security event; the risk values are proportionally assigned to the security events using statistical methods.
Further, in the embodiment provided by the present invention, 500000 pieces of data are randomly extracted from the massive log data as sample data, the proportion of each security event in all security events is counted, and the inverse of the proportion is used as the event risk value, so that the event risk value with a lower occurrence frequency is higher.
Further, in the embodiment provided by the invention, the risk index of the security event triggered by the behavior of the user within 5 minutes and the risk value of the user within 5 minutes are integrated.
Further, in the embodiment provided by the invention, the evaluation method provided by the embodiment of the invention has the characteristics of big data processing, machine learning and safety monitoring and safety response.
Further, in a specific implementation of step S101 provided in the embodiment of the present invention:
the feature engineering uses the existing unsupervised feature selection framework and the CUFS framework to perform feature denoising and selection. The CUFS extracts the relevance of the features from two aspects of the interior of the feature values and the pairwise feature values of other features.
In particular, a two-level coupling (coupling) is created by CUFS: eigenvalue coupling, eigen coupling. In the evaluation method of the embodiment of the present invention, the numerical value is not a number but a category value.
As shown in fig. 2, the value coupling between the feature internal value and the feature is learned to obtain an abnormal value of the feature value hierarchy, and a numerical map in which the abnormal value has an edge weight is generated. Wherein, the more abnormal features have greater contribution to the rule for distinguishing normal behavior from abnormal behavior, in other words, the higher the feature abnormal value is, the more relevant the feature is to the abnormal detection task. And inputting the obtained numerical value graph into the coupling analysis of the feature hierarchy to generate the feature graph for collecting the abnormal values of the feature value hierarchy.
In the embodiment of the present invention, V is a feature data set, F is a feature set, and N is a data number.
In an embodiment of the present invention, the data coupling VC ═ (f, δ, η), where feature f, the interaction function δ of one value v to feature f, and the interaction function η of one value v to other features.
In the embodiment of the present invention, the value map G ═ V, a, G (δ, η) >, where a value V ∈ V is expressed as a node, and the weight matrix a (V, V ') is based on the function G and is a joint function of the functions δ (V, V ') and η (V, V ').
In the exemplary embodiment of the invention, the characteristic coupling FC ═ dom (f), δ*,η*) Wherein the characteristic field dom (f) is the value, δ, contained in the characteristic f*Generating an outlier of the feature f, the outlier being based on a numerical function δ, η of the feature*The outliers of the feature f are captured, the value of f interacting with other features.
In an embodiment of the present invention, the feature map G*=<F,A*,h(δ*,η*)>Wherein one characteristic F is belonged to F and expressed as a node, and a weight matrix A*(f, f ') is based on the function h and is a joint function of δ (f, f ') and η (f, f ').
In an embodiment of the invention, the function
Figure BDA0003314027240000111
Where v ∈ dom (f), m ═ mode (f), freq (), is the frequency calculation function,
Figure BDA0003314027240000112
if a value v is more discrete from mode (f), the degree of abnormality of the value is higher.
In an embodiment of the present invention, the function η (v, v ') ═ δ (v) conf (v, v ') δ (v '), where v ∈ dom (f), v ' ∈ dom (f ',
Figure BDA0003314027240000113
if a value v is highly correlated with a high degree of abnormality value v, then the degree of abnormality for that value is also high.
Eigenvalue weight matrix
Figure BDA0003314027240000114
The entries in A represent the degree of abnormality of the values, the higher the entries, the higher the degree of abnormality of the values.
G is a directed self-looping graph, thus A (v, v ') ≠ A (v', v), and A (v, v) ≠ 0.
Further, in the present embodiment, it is assumed that the intra-feature value and the degree of abnormality across feature values are linear relationships, so we estimate the degree of abnormality of the feature layer as the sum of the degrees of abnormality of the feature values, where:
δ*(f)=∑v∈dom(f)δ(v),η*(f,f′)=∑v∈dom(f),v′∈dom(f′)η(v,v′)。
further, the feature weight matrix
Figure BDA0003314027240000115
Wherein, delta*And η*Normalized to [0-1]Intervals, facilitating feature comparison thereafter, A*The entries within represent the degree of abnormality of the feature, the higher the entries, the higher the degree of abnormality of the feature.
In the examples of the present invention, G*Is a directionless self-circulation diagram because A*(f,f′)=A*(f′,f)。
Further, in the embodiment of the present invention, finding the feature with the highest correlation with the anomaly detection may also be regarded as finding the graph G*The middle largest edge. The subset containing the correlation with anomaly detection can also be seen as the subset with the largest edge mean. The Count Encoding method is used to convert the features in the residuary subset (i.e., the set of features meaningful for anomaly detection) from classes to numbers. Count Encoding calculates the number of times each category appears, and the number of times is the value of the category. The multidimensional features are reduced to one dimension using the dimension reduction algorithm PCA.
In the specific implementation of step S102 provided in the embodiment of the present invention:
generating a parameter permutation and combination array representing seasonal rules, trend rules and rule relations, and searching for the optimal parameter of the minimum root mean square error parameter combination of each user by using grid search;
using the above optimal parameters, predicting 12 predicted values (one value for 5 minutes, 12 values for one hour) for the last hour using holtviters exponential smoothing and based on historical data;
using the above predicted values
Figure BDA0003314027240000121
And calculating the confidence area, wherein the true value is within the upper boundary upper _ bound of the confidence area and is normal, and the true value is outside the upper boundary upper _ bound of the confidence area and is abnormal. The confidence region is the 95% confidence interval, i.e. z score equals 1.96. Wherein the upper boundary of the confidence region is:
upper_bound=mean+(mean_absolute_error(
Figure BDA0003314027240000122
-mean)+z score*standard_deviation(
Figure BDA0003314027240000123
-mean))。
wherein, the mathematical expression of the Holtviters index smoothing method is as follows:
a. the linear relationship is: t isi+1=(Li+k*Bi)+Si+1-m+Ni+1
b. The non-linear relationship: t isi+1=(Li+k*Bi)*Si+1-m*Ni+1
c.Ti+1When the time sequence value is at the (i + 1) th time sequence value, the value is predicted;
d.(Li+k*Bi) Is the predicted trend at the i +1 th time;
wherein, Bi=β*[Li-Li-1]+(1-β)*Bi-1Beta is a parameter;
Figure BDA0003314027240000131
alpha is a parameter;
e.Si+1-mis the seasonal variation of length m predicted at the (i + 1) th time, wherein,
Figure BDA0003314027240000132
f.Ni+1is noise.
In some embodiments provided by the present invention, the step of evaluating the user behavior risk by using a statistical method based on the abnormal behavior of the user and in combination with a cloud dispute SIEM platform abnormal alarm module to obtain a user risk value includes:
calling all safety event rules in the cloud dispute SIEM platform abnormity warning module, wherein a rule set G is { G {1,g2,...,gnN rules;
randomly sampling from the whole to obtain a sample set X;
the rule generation set C is obtained through statistics, wherein C is { C ═ C }1,c2,...,cn};
Obtaining fraction of rule occurrences using rule occurrence set C
Figure BDA0003314027240000133
This proportion may also be considered a risk index;
converting the ratio to the reciprocal
Figure BDA0003314027240000134
The inverse of the events with smaller proportion is higher, and the inverse of the events with higher proportion is smaller;
calling a cloud dispute SIEM platform abnormity warning module to obtain a user risk value, wherein the user risk value is the reciprocal sum of all events multiplied by 100; score ═ sc1+sc2+...+sci)*100。
In this example, assuming that the existing data is a population (population), the population includes all normal events that may occur in daily production, and the risk index of all abnormal events is broken by 100%.
In some embodiments provided by the present invention, the step of performing early warning based on the user risk value includes: determining whether the user behavior is abnormal or not based on the user risk value, generating a monitoring panel by combining with a cloud dispute Ueba platform when the user behavior is abnormal, and sending a high-risk alarm by combining with a cloud dispute IXAlert alarm platform.
In another embodiment provided by the present invention, as shown in fig. 4, an unsupervised model based user behavior anomaly analysis and evaluation system is provided.
The evaluation system includes:
the data processing module 201 is used for performing feature engineering on the log data features by using an unsupervised learning method, converting the category features into the frequency of occurrence of feature values in the features, and removing noise features and useless features by comparing the abnormality degree between the feature values in the features and the abnormality degree between the features to obtain target features;
a behavior obtaining module 202, configured to learn a behavior rule of the user for the target feature by using a time series prediction model, so as to obtain an abnormal behavior of the user;
the behavior evaluation module 203 is used for evaluating the user behavior risk by using a statistical method based on the abnormal behavior of the user and combining with the cloud dispute SIEM platform abnormal alarm module to obtain a user risk value;
and the behavior early warning module 204 is used for early warning based on the user risk value.
Compared with the prior art, the unsupervised model-based user behavior anomaly analysis and evaluation method and system provided by the embodiment of the invention have the technical advantages that the unsupervised learning method is used for performing feature engineering on the log data features, the category features are converted into the frequency of appearance of feature values in the features, and the noise features and useless features are removed by comparing the anomaly degree between the feature values in the features and the anomaly degree between the features to obtain the target features; learning a user behavior rule for the target characteristics by using a time series prediction model to obtain abnormal behaviors of the user; based on the abnormal behavior of the user, and in combination with a cloud dispute SIEM platform abnormal alarm module, evaluating the user behavior risk by using a statistical method to obtain a user risk value; and early warning is carried out based on the user risk value, so that abnormal user behaviors can be accurately identified in log data with a large number of user behaviors, and inefficient regular learning and abnormal detection caused by excessive noise characteristics and meaningless characteristics are avoided.
Fig. 5 shows a block diagram of a computer device provided by an embodiment of the present invention.
Specifically, in a preferred implementation manner provided by the present invention, an embodiment of the present invention further provides a computer device 300, where the computer device includes a memory 301 and a processor 302, the memory 301 stores a computer program, and when the computer program is executed by the processor 302, the processor 302 executes the steps of the unsupervised model-based user behavior anomaly analysis and evaluation method.
In addition, the computer device 300 provided by the embodiment of the present invention may further have a communication interface 303 for receiving a control instruction.
An embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by the processor 302, the processor 302 executes the steps of the unsupervised model-based user behavior anomaly analysis and evaluation method.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
In a typical configuration of the present invention, the terminal, the device serving the network, and the computing device include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data.
Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present invention in its corresponding aspects.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The embodiments of the disclosure are intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (9)

1. The user behavior abnormity analysis and evaluation method based on the unsupervised model is characterized by comprising the following steps of:
performing feature engineering on the log data features by using an unsupervised learning method, converting category features into the frequency of appearance of feature values in the features, and removing noise features and useless features by comparing the abnormality degree between the feature values in the features and the abnormality degree between the features to obtain target features;
learning a user behavior rule for the target characteristics by using a time series prediction model to obtain abnormal behaviors of the user;
based on the abnormal behavior of the user, and in combination with a cloud dispute SIEM platform abnormal alarm module, evaluating the user behavior risk by using a statistical method to obtain a user risk value;
and carrying out early warning based on the user risk value.
2. The unsupervised model-based user behavior anomaly analysis and evaluation method according to claim 1, wherein the step of feature engineering the log data features using an unsupervised learning method comprises:
the feature engineering adopts an unsupervised feature selection CUFS framework, and feature denoising and selection are carried out on the basis of the CUFS framework;
creating two levels of coupling using the CUFS framework, the two levels of coupling comprising a characteristic numerical coupling and a characteristic coupling;
learning value coupling between the characteristic internal value and the characteristic to obtain an abnormal value of a characteristic value hierarchy, and generating a numerical value graph of which the abnormal value has edge weight;
and inputting the obtained numerical graph into feature coupling analysis to generate a feature graph of set feature value hierarchy abnormal values.
3. The unsupervised model-based user behavior anomaly analysis and evaluation method according to claim 1 or 2, wherein a holtvointers exponential smoothing model is adopted in the construction of the time series prediction model to learn the behavior rules of the user.
4. The unsupervised model-based user behavior anomaly analysis and evaluation method according to claim 3, wherein a grid search method is adopted in the construction of the time series prediction model to find parameters when the Holtpointers exponential smoothing model is used for different user behaviors.
5. The unsupervised model-based user behavior anomaly analysis and evaluation method according to claim 4, wherein the time series prediction model is constructed by using root mean square error of real values and predicted values to evaluate the parameters.
6. The unsupervised model-based user behavior anomaly analysis and evaluation method according to claim 5, wherein a model is adopted to predict a next behavior value of the user and generate a confidence interval in the construction of the time series prediction model; if the true data is greater than the upper bound of the confidence interval, then the time period behavior is abnormal.
7. The unsupervised model-based user behavior anomaly analysis and evaluation method as claimed in claim 6, wherein the step of evaluating the user behavior risk by using a statistical method based on the user's anomaly behavior in combination with a cloud dispute SIEM platform anomaly alarm module to obtain a user risk value comprises:
calling all safety event rules in the cloud dispute SIEM platform abnormity warning module, wherein a rule set G is { G {1,g2,...,gnN rules;
randomly sampling from the whole to obtain a sample set X;
the rule generation set C is obtained through statistics, wherein C is { C ═ C }1,c2,...,cn};
Obtaining fraction of rule occurrences using rule occurrence set C
Figure FDA0003314027230000031
Converting the ratio to the reciprocal
Figure FDA0003314027230000032
Calling a cloud dispute SIEM platform abnormity warning module to obtain a user risk value, wherein the user risk value is the reciprocal sum of all events multiplied by 100;
score=(sc1+sc2+...+sci)*100。
8. the unsupervised model-based user behavior anomaly analysis and evaluation method according to claim 1 or 2, wherein the step of performing early warning based on the user risk value comprises:
determining whether the user behavior is abnormal or not based on the user risk value;
and when the user behavior is abnormal, generating a monitoring panel by combining with a cloud dispute Ueba platform, and sending a high-risk alarm by combining with a cloud dispute IXAlert alarm platform.
9. An unsupervised model-based user behavior anomaly analysis and evaluation system, the evaluation system comprising:
the data processing module is used for performing feature engineering on the log data features by using an unsupervised learning method, converting the category features into the frequency of occurrence of feature values in the features, and removing noise features and useless features by comparing the abnormality degree between the feature values in the features and the abnormality degree between the features to obtain target features;
the behavior acquisition module is used for learning the behavior rule of the user on the target characteristics by using a time series prediction model to obtain the abnormal behavior of the user;
the behavior evaluation module is used for evaluating the user behavior risk by using a statistical method based on the abnormal behavior of the user and combining with the cloud dispute SIEM platform abnormal alarm module to obtain a user risk value;
and the behavior early warning module is used for carrying out early warning based on the user risk value.
CN202111225942.6A 2021-10-21 2021-10-21 Unsupervised model-based user behavior anomaly analysis and evaluation method and system Active CN113971119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111225942.6A CN113971119B (en) 2021-10-21 2021-10-21 Unsupervised model-based user behavior anomaly analysis and evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111225942.6A CN113971119B (en) 2021-10-21 2021-10-21 Unsupervised model-based user behavior anomaly analysis and evaluation method and system

Publications (2)

Publication Number Publication Date
CN113971119A true CN113971119A (en) 2022-01-25
CN113971119B CN113971119B (en) 2023-02-07

Family

ID=79587656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111225942.6A Active CN113971119B (en) 2021-10-21 2021-10-21 Unsupervised model-based user behavior anomaly analysis and evaluation method and system

Country Status (1)

Country Link
CN (1) CN113971119B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996113A (en) * 2022-07-28 2022-09-02 成都乐超人科技有限公司 Real-time monitoring and early warning method and device for abnormal operation of large-data online user

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753408A (en) * 2018-12-11 2019-05-14 江阴逐日信息科技有限公司 A kind of process predicting abnormality method based on machine learning
US20200027105A1 (en) * 2018-07-20 2020-01-23 Jpmorgan Chase Bank, N.A. Systems and methods for value at risk anomaly detection using a hybrid of deep learning and time series models
CN111552609A (en) * 2020-04-12 2020-08-18 西安电子科技大学 Abnormal state detection method, system, storage medium, program and server
CN111898758A (en) * 2020-09-29 2020-11-06 苏宁金融科技(南京)有限公司 User abnormal behavior identification method and device and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200027105A1 (en) * 2018-07-20 2020-01-23 Jpmorgan Chase Bank, N.A. Systems and methods for value at risk anomaly detection using a hybrid of deep learning and time series models
CN109753408A (en) * 2018-12-11 2019-05-14 江阴逐日信息科技有限公司 A kind of process predicting abnormality method based on machine learning
CN111552609A (en) * 2020-04-12 2020-08-18 西安电子科技大学 Abnormal state detection method, system, storage medium, program and server
CN111898758A (en) * 2020-09-29 2020-11-06 苏宁金融科技(南京)有限公司 User abnormal behavior identification method and device and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李海斌 等: "一种无监督的数据库用户行为异常检测方法", 《小型微型计算机系统》 *
王宪 等: "一种无监督学习的异常行为检测方法", 《光电工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996113A (en) * 2022-07-28 2022-09-02 成都乐超人科技有限公司 Real-time monitoring and early warning method and device for abnormal operation of large-data online user

Also Published As

Publication number Publication date
CN113971119B (en) 2023-02-07

Similar Documents

Publication Publication Date Title
US10031829B2 (en) Method and system for it resources performance analysis
US7089250B2 (en) Method and system for associating events
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
Abraham et al. Investigative profiling with computer forensic log data and association rules
Giurgiu et al. Additive explanations for anomalies detected from multivariate temporal data
WO2019160003A1 (en) Model learning device, model learning method, and program
Keeton et al. Do you know your IQ? A research agenda for information quality in systems
CN111078513A (en) Log processing method, device, equipment, storage medium and log alarm system
CN114169604A (en) Performance index abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
CN112395179B (en) Model training method, disk prediction method, device and electronic equipment
Joblin et al. How do successful and failed projects differ? a socio-technical analysis
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN113971119B (en) Unsupervised model-based user behavior anomaly analysis and evaluation method and system
Suleman et al. Google play store app ranking prediction using machine learning algorithm
US20230274160A1 (en) Automatically training and implementing artificial intelligence-based anomaly detection models
Steinhauer et al. Topic modeling for anomaly detection in telecommunication networks
CN117436768A (en) Unified supervision index method based on data management
US20240005259A1 (en) Index modeling
Tang et al. Bayesian network structure learning from big data: A reservoir sampling based ensemble method
CN115051863A (en) Abnormal flow detection method and device, electronic equipment and readable storage medium
Wang et al. Has Approximate Machine Unlearning been evaluated properly? From Auditing to Side Effects
KR101613397B1 (en) Method and apparatus for associating topic data with numerical time series
Mukhopadhyay et al. Predictive likelihood for coherent forecasting of count time series
Ohlsson Anomaly detection in microservice infrastructures
Xie et al. A synthetic multivariate exponentially weighted moving average scheme for monitoring of bivariate Gamma distributed processes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant