CN113837325A

CN113837325A - Unsupervised algorithm-based user anomaly detection method and unsupervised algorithm-based user anomaly detection device

Info

Publication number: CN113837325A
Application number: CN202111410811.5A
Authority: CN
Inventors: 梁淑云; 殷钱安; 余贤喆; 王启凡; 陶景龙; 徐�明; 刘胜; 马影; 周晓勇; 魏国富; 夏玉明
Original assignee: Information and Data Security Solutions Co Ltd
Current assignee: Information and Data Security Solutions Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2021-12-24
Anticipated expiration: 2041-11-25
Also published as: CN113837325B

Abstract

The application discloses a user abnormity detection method and device based on an unsupervised algorithm, relates to the technical field of network safety detection, and can improve the accuracy of user abnormity detection. The method comprises the following steps: acquiring user behavior log data of a web system; respectively calculating Bayes average values of the target user in a plurality of different time periods according to a plurality of service scene categories corresponding to the user behavior log data, wherein the Bayes average values are determined according to the one-dimensional behavior feature data of the target user; according to the multiple service scene categories of the target user and the Bayesian average values in multiple different time periods, respectively obtaining initial evaluation results of the target user in each service scene category by using different unsupervised models; and according to the type of the evaluation label in the initial evaluation result, adjusting the evaluation score in the initial evaluation result to obtain the abnormal detection result of the target user.

Description

Unsupervised algorithm-based user anomaly detection method and unsupervised algorithm-based user anomaly detection device

Technical Field

The invention relates to the technical field of network security detection, in particular to a user abnormity detection method and device based on an unsupervised algorithm.

Background

With the wide application of the internet in various industries, the number of internet users for enterprises is increasing, the types of users are increasingly diverse, and especially for enterprises which have massive users and use the internet in a large scale, such as e-commerce and finance, the number of active users can reach tens of millions daily, wherein the occupation ratio of malicious access represented by grey products and black products is high. However, the current user behavior anomaly detection method for finding potential problems or detecting malicious users and malicious behaviors is one-sided, and more comprehensive user behavior evaluation cannot be obtained based on partial data information.

In data mining, anomaly detection for user behavior log data is the identification of non-conforming patterns or distribution samples, i.e., the identification of "anomaly points". In the existing abnormal detection solution, the applied abnormal detection algorithm is different according to different detection data types, and the abnormal detection of the multidimensional characteristic data usually selects a supervised or unsupervised algorithm according to whether a data label exists; the anomaly detection of the single-dimensional feature data is generally realized by a rule threshold, a quantile and a 3 sigma rule principle statistical algorithm, but the technical problems of large difference of fluctuation intervals at different moments in a detection period, high false alarm rate, low accuracy rate and the like exist.

Disclosure of Invention

In view of this, the present application provides a user anomaly detection method and device based on an unsupervised algorithm, and mainly aims to solve the technical problems of high false alarm rate and low accuracy rate of the existing user behavior feature data anomaly detection in an actual service scene.

According to the application, a user abnormity detection method based on an unsupervised algorithm is provided, and the method comprises the following steps:

acquiring user behavior log data of a web system;

respectively calculating Bayes average values of the target user in a plurality of different time periods according to a plurality of service scene categories corresponding to the user behavior log data, wherein the Bayes average values are determined according to the one-dimensional behavior feature data of the target user;

according to the multiple service scene categories of the target user and the Bayesian average values in multiple different time periods, respectively obtaining initial evaluation results of the target user in each service scene category by using different unsupervised models;

and according to the type of the evaluation label in the initial evaluation result, adjusting the evaluation score in the initial evaluation result to obtain the abnormal detection result of the target user.

According to another aspect of the present application, there is provided an unsupervised algorithm-based user anomaly detection apparatus, comprising:

the data acquisition module is used for acquiring user behavior log data of the web system;

the characteristic processing module is used for respectively calculating Bayesian average values of the target user in a plurality of different time periods according to a plurality of service scene categories corresponding to the user behavior log data, wherein the Bayesian average values are determined according to the one-dimensional behavior characteristic data of the target user;

the initial evaluation module is used for respectively obtaining an initial evaluation result of the target user in each service scene category by using different unsupervised models according to the Bayesian average values of the target user in a plurality of service scene categories and a plurality of different time periods;

and the anomaly evaluation module is used for adjusting the evaluation score in the initial evaluation result according to the evaluation label type in the initial evaluation result to obtain the anomaly detection result of the target user.

According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above unsupervised algorithm-based user anomaly detection method.

According to still another aspect of the present application, there is provided a computer device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the above unsupervised algorithm-based user anomaly detection method when executing the program.

By the technical scheme, compared with the prior technical scheme of applying different anomaly detection algorithms according to different detection data types, the unsupervised algorithm-based user anomaly detection method and the unsupervised algorithm-based user anomaly detection device can acquire user behavior log data of a web system, respectively calculate Bayesian average values of a target user in different time periods according to a plurality of service scene types corresponding to the user behavior log data, wherein the Bayesian average values are determined according to single-dimensional behavior characteristic data of the target user, respectively obtain initial evaluation results of the target user in each service scene type by using different unsupervised models according to the Bayesian average values of the target user in the service scene types and the service scene types in the different time periods, respectively adjust evaluation scores in the initial evaluation results according to evaluation label types in the initial evaluation results, and obtaining an abnormal detection result of the target user. Therefore, by the method, the target user is detected by using the single-dimensional behavior characteristic data of the target user in a plurality of service scene categories and a plurality of different time periods, and the technical problems of high false alarm rate and low accuracy rate caused by large difference of fluctuation intervals at different moments in a detection period in the conventional anomaly detection method can be effectively solved, so that the anomaly detection false alarm rate is reduced, and the accuracy of anomaly judgment is improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart illustrating a user anomaly detection method based on an unsupervised algorithm according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of another unsupervised algorithm-based user anomaly detection method provided in the embodiment of the present application;

FIG. 3a is a first diagram illustrating a relationship of relevant data in an isolated forest model according to an embodiment of the present application;

fig. 3b shows a relationship diagram ii of related data in an isolated forest model provided in the embodiment of the present application;

fig. 3c shows a third relational diagram of related data in the isolated forest model provided by the embodiment of the present application;

fig. 4 is a schematic structural diagram illustrating a user anomaly detection apparatus based on an unsupervised algorithm according to an embodiment of the present application;

fig. 5 shows a schematic structural diagram of another unsupervised algorithm-based user anomaly detection apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The method aims at the technical problems of high false alarm rate and low accuracy rate caused by large difference of fluctuation intervals at different moments in a detection period in the existing user anomaly detection method. The embodiment provides a user anomaly detection method based on an unsupervised algorithm, which performs anomaly detection on a plurality of single-dimensional behavior characteristic data of different time periods according to a plurality of service scene categories in user behavior log data, and can effectively avoid the technical problems of high false alarm rate and low accuracy caused by large difference of fluctuation intervals at different moments in a detection period in the conventional anomaly detection method, so that the accuracy of user behavior anomaly judgment is improved while the anomaly detection false alarm rate is reduced, as shown in fig. 1, the method comprises the following steps:

step 101, obtaining user behavior log data of a web system.

In this embodiment, user behavior log data under multiple service scene categories in a web system is obtained, where the user behavior log data includes: initial single-dimensional behavior characteristic data corresponding to each service scene category, user identification information and user system operation time information. The user identification information is a field for identifying the uniqueness of the user, such as an account ID (ACCT _ ID), an IP address (IP _ ADDR), and the like; the user system operating TIME information is TIME (OPR _ TIME), but is not limited to the above fields.

And 102, respectively calculating Bayesian average values of the target user in a plurality of different time periods according to a plurality of service scene categories corresponding to the user behavior log data, wherein the Bayesian average values are determined according to the one-dimensional behavior feature data of the target user.

In this embodiment, according to the acquired user behavior log data under multiple service scene categories in the web system, bayesian average values of multiple different time periods of the target user under each service scene category are respectively calculated, where the bayesian average values are determined according to single-dimensional behavior feature data of the target user, where the bayesian average values calculate second initial single-dimensional behavior feature data of a time period adjacent to the target user according to first initial single-dimensional behavior feature data of the target user, calculate third initial single-dimensional behavior feature data of a relatively full-volume user according to the second initial single-dimensional behavior feature data, and calculate fourth initial single-dimensional behavior feature data of a relatively full-volume user according to the first initial single-dimensional behavior feature data, and further calculate a bayesian average value of the target user in each service scene category according to the first initial single-dimensional behavior feature data, the second initial single-dimensional behavior feature data, the bayesian average value, and the bayesian average value of the target user in each service scene category, where the bayesian average value is determined according to the first initial single-dimensional behavior feature data, the second initial single-dimensional behavior feature data, the third initial single-dimensional behavior feature data, and the third initial single-dimensional behavior feature data, where the third initial single-dimensional behavior feature data are calculated relative full-dimensional behavior feature data is calculated according to the total-dimensional behavior feature data of the target user in each service scene category, And determining Bayesian average values of the target user in each time period by using the third initial single-dimensional behavior characteristic data and the fourth initial single-dimensional behavior characteristic data. Therefore, for each service scene category, by acquiring initial single-dimensional behavior characteristic data of a target user in a plurality of different time periods and initial single-dimensional behavior characteristic data of a full-scale user in the current time period, the calculated Bayesian average value of the target user can reflect the longitudinal fluctuation level of the target user behavior log data in the time dimension and the transverse fluctuation level of the target user behavior log data in the full-scale user behavior log data, so that more accurate data basis can be provided for subsequent abnormal evaluation.

And 103, respectively obtaining an initial evaluation result of the target user in each service scene category by using different unsupervised models according to the service scene categories of the target user and the Bayesian average values in different time periods.

In this embodiment, based on each service scene category, bayesian average values in a plurality of different time periods are respectively used as input of an unsupervised model, so as to obtain an evaluation score and an evaluation label of each service scene category of a target user. The method can realize comprehensive evaluation of the operation behavior of the target user based on the multi-service scene dimensionality of the target user, and can reflect the behavior characteristics of the target user in the current period and the behavior characteristics of the target user in the previous period based on a plurality of different time period dimensionalities of the target user, thereby evaluating the abnormal behavior of the target user more accurately. In addition, the unsupervised algorithm is used for carrying out abnormity evaluation, label data training is not needed, the Bayesian average value can be simply and efficiently processed, real-time online detection of user behaviors is further realized, and abnormal behaviors of the user can be found in time.

And step 104, adjusting the evaluation score in the initial evaluation result according to the evaluation label type in the initial evaluation result to obtain the abnormal detection result of the target user.

In this embodiment, according to the type of an evaluation tag in an initial evaluation result corresponding to each service scene category of a target user, an abnormal risk score of the target user is determined by adjusting an evaluation score in the initial evaluation result; and based on the number of service scene types triggered by the target user and the interval duration between the previous operation time (the latest operation time of the user) and the current time, carrying out weighted calculation on the abnormal risk score of the target user to obtain an abnormal detection result of the target user. In addition, whether the target user behavior is abnormal or not is determined, meanwhile, the abnormal degree of the target user behavior can be evaluated, important abnormality can be recognized quickly, quick response is made, and the abnormality processing timeliness is improved.

For the present embodiment, according to the above scheme, user behavior log data of the web system is obtained, according to a plurality of service scenario categories corresponding to the user behavior log data, single-dimensional behavior feature data of the target user in a plurality of different time periods is respectively calculated, according to a plurality of service scenario categories of the target user and bayesian average values in a plurality of different time periods, the bayesian average values are determined according to the single-dimensional behavior feature data of the target user, initial evaluation results of the target user in each service scenario category are respectively obtained by using different unsupervised models, according to evaluation tag types in the initial evaluation results, an anomaly detection result of the target user is obtained by adjusting evaluation scores in the initial evaluation results, compared with the existing technical scheme of applying different anomaly detection algorithms according to different detection data types, according to the embodiment, the target user is detected based on the single-dimensional behavior characteristic data of the target user in a plurality of service scene categories and a plurality of different time periods, so that the technical problems of high false alarm rate and low accuracy rate caused by large fluctuation interval difference at different moments in a detection period in the conventional anomaly detection method can be effectively solved, and the accuracy of user anomaly detection is improved while the user anomaly detection false alarm rate is reduced.

Further, as a refinement and an extension of the specific implementation of the foregoing embodiment, in order to fully illustrate the specific implementation process of the present embodiment, another user anomaly detection method based on an unsupervised algorithm is provided, as shown in fig. 2, the method includes:

step 201, obtaining user behavior log data of the web system.

In implementation, the user behavior log DATA of the current natural day and within a preset time period (e.g. about 15 days) of the web system is obtained, and the user behavior log DATA includes, but is not limited to, the following fields: the user identification for identifying the uniqueness of the user, such as account ID (ACCT _ ID), IP address (IP _ ADDR), etc., and user system operation TIME (OPR _ TIME).

Step 202, grouping the user behavior log data into service scene categories to obtain user behavior log data of a plurality of service scene categories corresponding to the target user identification. Wherein the user behavior log data types of the plurality of service scenario categories include: the system comprises one or more of system operation frequency information, user login frequency information, sensitive interface calling frequency information, sensitive data access amount information and short interval operation frequency information.

Step 203, obtaining first initial single-dimensional behavior characteristic data in a plurality of different time periods corresponding to the target user identification for each service scene category.

In implementation, based on each service scene category of the web system, the user behavior log DATA is respectively counted according to the user identifier in the user behavior log DATA and a plurality of different time periods corresponding to each user identifier, so as to obtain the user behavior log DATA, namely the first initial single-dimensional behavior feature DATA, in the plurality of different time periods corresponding to each user identifier. The time period may be divided into 4 different time periods of 1 day, 3 days, 7 days, and 15 days according to the natural day, and the time period is not specifically limited herein.

And 204, calculating to obtain a Bayesian average value of the target user in each time period by using a Bayesian average algorithm according to the first initial single-dimensional behavior feature data corresponding to each service scene category.

To illustrate the specific implementation of step 204, as a preferred embodiment, step 203 specifically includes: taking the proportion of first initial single-dimensional behavior feature data of a target user in a current time period in initial single-dimensional behavior feature data in an adjacent time period as second initial single-dimensional behavior feature data, obtaining third initial single-dimensional behavior feature data of a relatively full-scale user through mean processing according to the second initial single-dimensional behavior feature data of the target user in the current period, obtaining fourth initial single-dimensional behavior feature data of the relatively full-scale user through mean processing according to the first initial single-dimensional behavior feature data of the relatively full-scale user in the current period, and determining the Bayesian average value of the target user in each time period according to the first initial single-dimensional behavior feature data, the second initial single-dimensional behavior feature data, the third initial single-dimensional behavior feature data and the fourth initial single-dimensional behavior feature data.

In the implementation, to clarify the bayesian averaging algorithm in detail, taking the number of system operations (user behavior log data type) in a 1-day time period under the category of the system operation service scenario as an example, the calculation process of step 204 is explained:

1) the number of times of system operation of the target user on the previous 1 day (previous time period) is obtained as opr _ cnt1, and the number of times of system operation on the current day is obtained as opr _ cnt (first initial single-dimensional behavior feature data), so that the sum of the number of times of system operation of the target user on the current day and the previous 1 day is obtained.

2) Calculating the proportion percentage (second initial single-dimensional behavior characteristic data) of the system operation times opr _ cnt of the current day of the target user in the sum of the system operation times of two days, and satisfying the following formula:

3) calculating the average percent _ avg (third initial single-dimensional behavior characteristic data) of the ratio of the system operation times of the total number of users on the day to the sum of the system operation times of two days, and satisfying the following formula:

wherein n is the total number of users on the day,

the percentage value sum of the total users is obtained based on the percentage value of the system operation times of each user in the day in the sum of the system operation times of two days, and percentage (second initial single-dimensional behavior characteristic data) is referred.

4) Calculating the average value oprcnt _ avg (fourth initial single-dimensional behavior characteristic data) of the system operation times of the whole quantity of users on the day, and satisfying the following formula:

wherein n is the total number of users on the day,

for the sum of the system operation times of the total number of users on the day, the opr _ cnt (first initial single-dimensional behavior feature data) is referred to.

5) Based on the first initial single-dimensional behavior characteristic data opr _ cnt, the second initial single-dimensional behavior characteristic data percentage, the third initial single-dimensional behavior characteristic data percentage _ avg and the fourth initial single-dimensional behavior characteristic data oprcnt _ avg in 1) -4), calculating a Bayesian average score of the current day of the target user, and satisfying the following formula:

according to the needs of the actual application scenario, the mean value oprcnt _ avg (fourth initial single-dimensional behavior feature data) of the system operation times of the total number of users on the day calculated in 4) is not limited to mean value calculation, and numerical values capable of representing the overall level of data, such as median, mean value and the like, may also be selected according to data distribution or business actual conditions, and here, the calculation mode of the fourth initial single-dimensional behavior feature data oprcnt _ avg is not specifically limited. However, it should be noted that, for the purpose of detecting whether the target user has a service purpose of detecting the system operation frequency abnormality, the fourth initial single-dimensional behavior feature data oprcnt _ avg cannot select an extreme calculation manner such as a maximum value and a minimum value.

In an actual application scenario, as shown in fig. 3a and 3b, according to the bayesian averaging algorithm, bayesian average score of a total number of user ACCTs _ IDs in a certain time period can be obtained, and then bayesian average data sets of user DAY _ ID for a plurality of different time periods are calculated, which are expressed as: score1, score2, score3, score 4. Specifically, as shown in fig. 3a, the number of times of system operation opr _ cnt (first initial single-dimensional behavior feature data) of the target user 20673 on the same day is 5, the number of times of system operation on the near 1 day is 6, the sum of the number of times of system operation on the two days is 11, the percentage (second initial single-dimensional behavior feature data) of the number of times of system operation on the same day to the number of times of system operation on the two days is 0.454545, the mean percentage _ avg (third initial single-dimensional behavior feature data) of the percentage (ratio) of the number of times of system operation on the same day to the number of times of system operation on the two days is 0.643081272, the mean value of the number of times of system operation on the same day oprcnt _ avg (fourth initial single-dimensional behavior feature data) of the total user is 6, and the bayesian mean value of the target user 20673 is 0.557383 calculated by the above formula (4). By analogy, 4 different time periods of last 1 day, 3 days, 7 days, 15 days, bayesian averages score1, score2, score3, score4 for the number of system operations on the day were calculated for each user, respectively.

The system operation frequency of the user may be an accumulated operation frequency or a current operation frequency, the accumulated operation frequency is selected if an abnormal condition of the target user within a period of time needs to be detected, the current operation frequency is selected if an abnormal condition of the target user at the current stage needs to be detected, and the statistics of the operation frequency is not specifically limited here.

It should be noted that, based on the initial single-dimensional behavior feature data of the target user and the total users, and for a plurality of different time periods, the bayesian average value of the target user calculated by using the bayesian average algorithm can reflect the behavior abnormality of the target user relative to the total users and the behavior abnormality of the target user in the time dimension, thereby effectively improving the accuracy of detecting the abnormal users.

And step 205, determining a plurality of isolated forest models corresponding to the plurality of service scene categories.

And step 206, obtaining an initial evaluation result of the target user in each service scene category according to the Bayesian average values of the target user in a plurality of different time periods by using the isolated forest model corresponding to each service scene category.

In implementation, for bayesian mean data sets score1, score2, score3 and score4 corresponding to a plurality of business scene categories, an isolated forest model corresponding to each bayesian mean data set is determined respectively. The isolated forest (iForest) model is composed of a plurality of binary trees (iTrees), the binary trees are random trees, and each node is provided with a left child node and a right child node or a leaf node. As shown in fig. 3b, taking an isolated forest model corresponding to a certain service scene category as an example, randomly selecting score3 in the bayesian average data set, randomly selecting a value corresponding to score3 of a certain user 17660, and setting the value as a threshold, such as value =0.2 as a threshold, classifying score3 in the bayesian average data set one by one according to the threshold, classifying the user node with the value greater than or equal to the threshold as a right node, and classifying the user node with the value less than the threshold as a left node, for example, the value of attribute score3 of the first recorded user 18502 is 0.147368, and is less than the threshold (0.2), then recording the user node in the left node, and recursively constructing left and right nodes until the node is satisfied and can not be classified any more or the height of the tree reaches a preset value, thereby completing binary tree construction.

Specifically, by traversing a binary tree in the isolated forest model corresponding to each service scene class, for example, a bayesian mean data set score1, score2, score3, score4 corresponding to a certain service scene class is used as an input of the determined isolated forest model, and each node is classified by using an unsupervised anomaly detection algorithm, so as to obtain an evaluation score pred _ score and an evaluation label of each node, that is, an initial evaluation result of the target user in each service scene class. If the evaluation label is-1, the evaluation label belongs to an abnormal label; if the evaluation label is 1, the label belongs to a normal label.

Step 207, if the evaluation label type of the target user in the service scene category is an abnormal label, adjusting the evaluation score of the target user in the service scene category to a first abnormal risk score corresponding to the service scene category, where the first abnormal risk score is an abnormal reference score of the service scene category.

And 208, if the evaluation label type of the target user in the service scene category is a normal label, calculating a second abnormal risk score of the target user in the service scene category by using the abnormal reference score of the service scene category as a weight value.

In implementation, the initial evaluation results of one or more of the system operation times, the user login times, the sensitive interface calling times, the sensitive data access amount and the short-interval operation times are respectively calculated by using the isolated forest model corresponding to each service scene category. And determining an evaluation label of the target user in each service scene category according to the initial evaluation result, and if the evaluation label corresponding to the service scene category is-1, adjusting the evaluation score to be an abnormal reference score, namely a first abnormal risk score of the service scene category. The abnormal benchmark score is respectively set according to the importance degree of each service scene category, and specifically includes: the abnormal reference score of the system operation frequency information is 15, the abnormal reference score of the user login frequency information is 15, the abnormal reference score of the sensitive interface calling frequency information is 25, the abnormal reference score of the sensitive data access amount information is 25, and the abnormal reference score of the short interval operation frequency information is 20.

Correspondingly, if the evaluation label of the target user in a certain service scene category is 1, the evaluation score pred _ score of the service scene category is used as a weight value and is multiplied by the corresponding abnormal reference score, so that a second abnormal risk score base _ score of the target user in the service scene category is obtained. For example, as shown in fig. 3c, the user 22717 evaluates the label to-1, i.e. an abnormal label, and adjusts the abnormal risk score base _ score of the abnormal label to be the abnormal base score 15 corresponding to the business scenario category, under the business scenario type of the system operation times; user 18142 evaluates tab to 1, i.e., a normal tab, under the business scenario type of the number of system operations, and determines its abnormal risk score base _ score to be 0.457169 × 15= 6.857535. Finally, for the target user, accumulating the abnormal risk scores of the related service scenario types, namely sum _ base _ score (x) = Σ base _ score, where x is an evaluation label, and satisfying the following formula:

and 209, determining an abnormal detection result of the target user according to the first abnormal risk score and the second abnormal risk score.

To illustrate the specific implementation of step 209, as a preferred embodiment, step 209 may specifically include: determining a business scene cascade coefficient for correcting the abnormal detection result of the target user according to the business scene category number corresponding to the target user, determining an attenuation coefficient for correcting the abnormal detection result of the target user according to the interval duration between the previous operation time and the current time of the target user, taking the business scene cascade coefficient and/or the attenuation coefficient as a weight coefficient, and determining the abnormal detection result of the target user according to the first abnormal risk score and the second abnormal risk score.

In implementation, the number of service scene categories triggered by a user is set as n, and a service scene cascade coefficient j is defined to satisfy the following formula:

further, according to the interval duration delta t between the previous operation time and the current time of the target user and the cooling coefficient c, determining the attenuation coefficient as

. The size of the cooling coefficient c is inversely related to the required cooling time, that is, the larger the c is set, the shorter the cooling time is, and the cooling coefficient c may be set to 0.3 according to the requirement of the actual application scenario, where the size of the cooling coefficient c is not specifically limited. For example, the last operation time of all service scenario types operated by the target user is set as the time of occurrence of an anomaly, a cooling score cool _ score (anomaly detection result of the target user) is defined, and if the time interval of occurrence of the anomaly is longer, the influence possibly caused by the anomaly is smaller, that is, the importance of the anomaly is attenuated along with time, so that the anomaly detection result of the target user is corrected by using the attenuation coefficient, which is helpful for distinguishing the anomaly degree, and further improving the detection accuracy.

The earlier the abnormality occurs, the smaller the influence of the target user is, and the attention to the target user is correspondingly reduced, so that the following formula is satisfied:

in addition, according to the requirement of the actual application scenario, the final anomaly detection result of the target user is further determined by using the bayesian averaging algorithm according to the two-day cooling score cool _ score (anomaly detection result of the target user) of the target user. For example, a floating score (a final abnormality detection result of the user) risk _ score is defined to satisfy the following formula:

wherein, cool _ score _ percentage represents the ratio of the cooling score of the target user in the current day to the sum of the cooling scores of the two days, cool _ score _ percentage _ avg represents the average of the ratio of the cooling score of the full user in the current day to the sum of the cooling scores of the two days, cool _ score _ avg represents the average of the cooling score of the full user in the current day, the calculation formulas refer to formulas (1) to (3), and the description is omitted here. In summary, the threshold of the risk score is defined as 0.7, and when the cool _ score or the risk _ score is greater than 0.7, an abnormality alarm is performed for the target user, and the size of the threshold is not specifically limited herein.

It can be seen that, in the embodiment, the bayesian average algorithm is used to perform bayesian average conversion on initial single-dimensional behavior feature data in different service scene categories and different time periods, and the abnormal condition of the target user is further determined based on the isolated forest model of the unsupervised algorithm by combining the change features of the user behavior log data in the transverse direction and the longitudinal direction. In addition, based on the anomaly detection of the unsupervised algorithm, the isolated forest model does not need label data training, the Bayesian average value processing process is simple and high in efficiency, and the method can be applied to real-time online detection of a system and timely find the anomaly condition of a target user, wherein a user scoring system is constructed by combining the anomaly reference values, the cascade coefficients, the cooling coefficients and the floating values defined by each value scene, so that the anomaly degree of the target user can be accurately, objectively and comprehensively represented, important anomaly conditions can be timely processed, the response speed is improved, and loss is avoided.

By applying the technical scheme of the embodiment, user behavior log data of a web system is obtained, according to a plurality of service scene categories corresponding to the user behavior log data, single-dimensional behavior feature data of a target user in a plurality of different time periods are respectively calculated, according to the plurality of service scene categories of the target user and Bayesian average values in the plurality of different time periods, the Bayesian average values are determined according to the single-dimensional behavior feature data of the target user, initial evaluation results of the target user in each service scene category are respectively obtained by using different unsupervised models, according to the evaluation scores and the evaluation label types in the initial evaluation results, the abnormal detection result of the target user is obtained by adjusting the evaluation scores in the initial evaluation results, compared with the existing technical scheme of applying different abnormal detection algorithms according to different detection data types, according to the embodiment, the target user is detected based on the single-dimensional behavior characteristic data of the target user in a plurality of service scene categories and a plurality of different time periods, so that the technical problems of high false alarm rate and low accuracy rate caused by large fluctuation interval difference at different moments in a detection period in the conventional anomaly detection method can be effectively solved, and the accuracy of user anomaly detection is improved while the user anomaly detection false alarm rate is reduced.

Further, as a specific implementation of the method in fig. 1, an embodiment of the present application provides a user anomaly detection apparatus, as shown in fig. 4, the apparatus includes: a data acquisition module 41, a feature processing module 42, an initial evaluation module 43, and an anomaly evaluation module 44.

The data obtaining module 41 may be configured to obtain user behavior log data of the web system.

The feature processing module 42 may be configured to calculate, according to a plurality of service scene categories corresponding to the user behavior log data, bayesian average values of the target user in a plurality of different time periods, where the bayesian average values are determined according to one-dimensional behavior feature data of the target user.

The initial evaluation module 43 may be configured to obtain an initial evaluation result of the target user in each service scene category by using different unsupervised models according to the bayesian averages of the target user in the plurality of service scene categories and in a plurality of different time periods.

The anomaly evaluation module 44 may be configured to adjust the evaluation score in the initial evaluation result according to the evaluation score and the evaluation tag type in the initial evaluation result to obtain an anomaly detection result of the target user.

In a specific application scenario, as shown in fig. 5, the feature processing module 42 includes a scenario data grouping unit 421, a period data obtaining unit 422, and a bayesian average calculating unit 423.

The context data grouping unit 421 may be configured to perform service context category grouping on the user behavior log data to obtain user behavior log data of multiple service context categories corresponding to the target user identifier.

In an actual application scenario, the user behavior log data of the plurality of service scenario categories includes: the system comprises one or more of system operation frequency information, user login frequency information, sensitive interface calling frequency information, sensitive data access amount information and short interval operation frequency information.

The period data obtaining unit 422 may be configured to obtain, for each service scenario category, first initial single-dimensional behavior feature data in a plurality of different time periods corresponding to the target user identifier.

The bayesian average calculating unit 423 may be configured to calculate, according to the first initial single-dimensional behavior feature data corresponding to each service scenario category, a bayesian average value of the target user in each time period by using a bayesian average algorithm.

In an actual application scenario, the bayesian average calculating unit 423 may be specifically configured to: taking the ratio of the first initial single-dimensional behavior characteristic data of the target user in the current time period to the initial single-dimensional behavior characteristic data in the adjacent time period as second initial single-dimensional behavior characteristic data; obtaining third initial single-dimensional behavior characteristic data of a relatively full number of users through mean processing according to the second initial single-dimensional behavior characteristic data of the target user in the current period; obtaining fourth initial single-dimensional behavior characteristic data of the relative full-scale users through mean processing according to the first initial single-dimensional behavior characteristic data of the full-scale users in the current period; and determining the Bayesian average value of the target user in each time period according to the first initial single-dimensional behavior feature data, the second initial single-dimensional behavior feature data, the third initial single-dimensional behavior feature data and the fourth initial single-dimensional behavior feature data.

In a specific application scenario, the initial evaluation module 43 includes a determination unit 431 and an initial evaluation unit 432.

A determining unit 431 may be configured to determine a plurality of isolated forest models corresponding to the plurality of traffic scene categories.

The initial evaluation unit 432 may be configured to obtain an initial evaluation result of the target user in each service scene category according to bayesian average values of the target user in a plurality of different time periods by using the isolated forest model corresponding to each service scene category.

In a specific application scenario, the anomaly evaluation module 44 includes a first anomaly score determining unit 441, a second anomaly score determining unit 442, and an anomaly result determining unit 443.

The first abnormal score determining unit 441 may be configured to, if the type of the evaluation label of the target user in the service scenario category is an abnormal label, adjust the evaluation score of the target user in the service scenario category to a first abnormal risk score corresponding to the service scenario category, where the first abnormal risk score is an abnormal reference score of the service scenario category.

The second abnormal score determining unit 442 may be configured to, if the evaluation tag type of the target user in the service scene category is a normal tag, use the abnormal benchmark score of the service scene category as a weight value, and calculate a second abnormal risk score of the target user in the service scene category.

The abnormal result determination unit 443 may be configured to determine an abnormal detection result of the target user according to the first abnormal risk score and the second abnormal risk score.

In a specific application scenario, the abnormal result determining unit 443 may specifically be configured to: determining a business scene cascade coefficient for correcting the abnormal detection result of the target user according to the business scene category number corresponding to the target user; determining an attenuation coefficient for correcting the abnormal detection result of the target user according to the interval duration of the previous operation time and the current time of the target user; and determining an abnormal detection result of the target user according to the first abnormal risk score and the second abnormal risk score by taking the business scene cascade coefficient and/or the attenuation coefficient as a weight coefficient.

It should be noted that other corresponding descriptions of the functional units related to the user anomaly detection device based on the unsupervised algorithm provided in the embodiment of the present application may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not described herein again.

Based on the above-mentioned methods shown in fig. 1 and fig. 2, correspondingly, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for detecting user anomaly based on an unsupervised algorithm as shown in fig. 1 and fig. 2 is implemented.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method described in the implementation scenarios of the present application.

Based on the above methods shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 4, in order to achieve the above object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described unsupervised algorithm based user anomaly detection method as shown in fig. 1 and 2.

Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.

It will be understood by those skilled in the art that the computer device structure provided in the present embodiment is not limited to the physical device, and may include more or less components, or combine some components, or arrange different components.

The storage medium may further include an operating system and a network communication module. An operating system is a program that manages the hardware and software resources of a computer device, supporting the operation of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the technical scheme of the application, compared with the existing technical scheme of applying different anomaly detection algorithms according to different detection data types, the embodiment is based on the fact that the target user is detected by single-dimensional behavior characteristic data of a plurality of different time periods in a plurality of service scene categories, and can effectively avoid the technical problems of high false alarm rate and low accuracy rate caused by large fluctuation interval difference at different moments in a detection period in the existing anomaly detection method, so that the accuracy of user anomaly detection is improved while the false alarm rate of user anomaly detection is reduced.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A user abnormity detection method based on an unsupervised algorithm is characterized by comprising the following steps:

acquiring user behavior log data of a web system;

according to the multiple service scene categories of the target user and the Bayesian average values in multiple different time periods, respectively obtaining initial evaluation results of the target user in the multiple service scene categories by using different unsupervised models;

2. The method according to claim 1, wherein the step of calculating the bayesian averages of the target user in a plurality of different time periods according to a plurality of service scenario categories corresponding to the user behavior log data comprises:

grouping the service scene categories of the user behavior log data to obtain user behavior log data of a plurality of service scene categories corresponding to the target user identification;

respectively obtaining first initial single-dimensional behavior characteristic data in a plurality of different time periods corresponding to the target user identification aiming at each service scene category;

and calculating to obtain a Bayesian average value of the target user in each time period by using a Bayesian average algorithm according to the first initial single-dimensional behavior feature data corresponding to each service scene category.

3. The method of claim 2, wherein the user behavior log data types for the plurality of traffic scenario categories comprise: the system comprises one or more of system operation frequency information, user login frequency information, sensitive interface calling frequency information, sensitive data access amount information and short interval operation frequency information.

4. The method according to claim 2, wherein the step of calculating a bayesian average value of the target user in each time period by using a bayesian average algorithm according to the first initial single-dimensional behavior feature data corresponding to each service scenario category comprises:

taking the ratio of the first initial single-dimensional behavior characteristic data of the target user in the current time period to the initial single-dimensional behavior characteristic data in the adjacent time period as second initial single-dimensional behavior characteristic data;

obtaining third initial single-dimensional behavior characteristic data of a relatively full number of users through mean processing according to the second initial single-dimensional behavior characteristic data of the target user in the current period;

obtaining fourth initial single-dimensional behavior characteristic data of the relative full-scale users through mean processing according to the first initial single-dimensional behavior characteristic data of the full-scale users in the current period;

and determining the Bayesian average value of the target user in each time period according to the first initial single-dimensional behavior feature data, the second initial single-dimensional behavior feature data, the third initial single-dimensional behavior feature data and the fourth initial single-dimensional behavior feature data.

5. The method according to any one of claims 1 to 4, wherein the step of obtaining the initial evaluation result of the target user in each service scenario category by using different unsupervised models according to the bayesian average values of the target user in a plurality of service scenario categories and a plurality of different time periods comprises:

determining a plurality of isolated forest models corresponding to the plurality of business scene categories;

and obtaining an initial evaluation result of the target user in each service scene category according to the Bayesian average values of the target user in a plurality of different time periods by using the isolated forest model corresponding to each service scene category.

6. The method according to claim 1, wherein the step of obtaining the anomaly detection result of the target user by adjusting the evaluation score in the initial evaluation result according to the evaluation label type in the initial evaluation result comprises:

if the evaluation label type of the target user in the service scene category is an abnormal label, adjusting the evaluation score of the target user in the service scene category to a first abnormal risk score corresponding to the service scene category, wherein the first abnormal risk score is an abnormal reference score of the service scene category;

if the evaluation label type of the target user in the service scene category is a normal label, calculating a second abnormal risk score of the target user in the service scene category by taking the abnormal benchmark score of the service scene category as a weight value;

and determining an abnormal detection result of the target user according to the first abnormal risk score and the second abnormal risk score.

7. The method according to claim 6, wherein the step of determining the anomaly detection result of the target user according to the first anomaly risk score and the second anomaly risk score comprises:

determining a business scene cascade coefficient for correcting the abnormal detection result of the target user according to the business scene category number corresponding to the target user;

determining an attenuation coefficient for correcting the abnormal detection result of the target user according to the interval duration of the previous operation time and the current time of the target user;

and determining an abnormal detection result of the target user according to the first abnormal risk score and the second abnormal risk score by taking the business scene cascade coefficient and/or the attenuation coefficient as a weight coefficient.

8. An unsupervised algorithm-based user anomaly detection device, comprising:

9. A storage medium having stored thereon a computer program, characterized in that said program, when being executed by a processor, implements the unsupervised algorithm based user anomaly detection method of any one of claims 1 to 7.

10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the unsupervised algorithm based user anomaly detection method of any one of claims 1 to 7 when executing the program.