CN112306982B

CN112306982B - Abnormal user detection method and device, computing equipment and storage medium

Info

Publication number: CN112306982B
Application number: CN202011276015.2A
Authority: CN
Inventors: 王滨; 张峰; 万里; 王星; 李志强; 徐文渊; 冀晓宇
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2021-07-16
Anticipated expiration: 2040-11-16
Also published as: CN112306982A

Abstract

The application provides an abnormal user detection method, an abnormal user detection device, computing equipment and a storage medium. The abnormal user detection method comprises the following steps: acquiring log data of a historical time period, wherein each log in the log data comprises a user identifier, an operation behavior of a user on target equipment and log time; determining a first log sequence of each user identifier and a second log sequence of each user identifier within each time interval based on the log data; determining first characteristic data of each user identifier and second characteristic data of each user identifier in each time interval based on the first log sequence and the second log sequence; clustering the first characteristic data to obtain a first clustering result; clustering the second characteristic data to obtain a second clustering result; determining abnormal users according to the first clustering result; and determining abnormal users according to the second clustering result.

Description

Abnormal user detection method and device, computing equipment and storage medium

Technical Field

The present application relates to the field of information security technologies, and in particular, to a method and an apparatus for detecting an abnormal user, a computing device, and a storage medium.

Background

In some application scenarios, a user may operate a target device (e.g., a security management platform, etc.). The security management platform can be a video monitoring platform and the like. The operation behavior of the abnormal user has information safety hidden danger. Therefore, it is necessary to detect an abnormal user.

In the current detection scheme for abnormal users, operation behaviors on target equipment are usually analyzed manually based on logs of the target equipment to determine abnormal operation behaviors, and a plurality of users meeting the abnormal operation behaviors are queried for manual analysis to determine the abnormal users. The current detection scheme for abnormal users is low in efficiency.

In view of this, how to improve the detection efficiency of the abnormal user is a technical problem to be solved.

Disclosure of Invention

The application provides an abnormal user detection method, an abnormal user detection device, a computing device and a storage medium, and can improve detection efficiency of abnormal user discovery.

According to an aspect of the present application, there is provided an abnormal user detection method, including:

acquiring log data of a historical time period, wherein each log in the log data comprises a user identifier, an operation behavior of a user on target equipment and log time;

determining a first log sequence of each user identifier and a second log sequence of each user identifier in each time interval based on the log data, wherein the time intervals are the result of dividing the historical time period according to unit time length, the first log sequence of each user identifier is the result of sequencing the logs of the user identifier according to log time, and the second log sequence of each user identifier in each time interval is the result of sequencing the logs of the user identifier in the time interval according to log time;

determining first characteristic data of each user identifier and second characteristic data of each user identifier in each time interval on the basis of the first log sequence and the second log sequence, wherein the first characteristic data is used for characterizing the sequence of the operation behaviors in the first log sequence of the user identifier, and the second characteristic data is used for characterizing the sequence of the operation behaviors in the second log sequence of each user identifier in each time interval;

clustering the first characteristic data to obtain a first clustering result, wherein the first clustering result is used for representing behavior differences among different users;

clustering the second characteristic data to obtain a second clustering result, wherein the second clustering result is used for representing behavior differences of the user in different time intervals;

determining corresponding abnormal users according to the first clustering result;

and determining corresponding abnormal users according to the second clustering result.

In some embodiments, said determining, based on said log data, a first log sequence for each user identification and a second log sequence for each user identification within each time interval comprises:

dividing the log data according to the user identification, and determining the log of each user identification;

sequencing the logs of each user identifier according to the time sequence to obtain a first log sequence of each user identifier;

and dividing the first log sequence of each user identifier according to the time intervals, and determining a second log sequence of each user identifier in each time interval.

In some embodiments, the determining, based on the first log sequence and the second log sequence, first feature data for each user identification and second feature data for each user identification within each time interval comprises:

representing the sequence of the operation behaviors in each second log sequence of each user identifier as a matrix by using the vector of the operation behaviors of the target device;

determining a matrix of a first log sequence of each user identifier according to the matrix of each second log sequence of each user identifier, wherein the matrix of the first log sequence is a vector representation of a sequence of operation behaviors in the first log sequence;

performing feature extraction on the matrix of the first log sequence of each user identifier to obtain first feature data of each user identifier;

and performing feature extraction on the matrix of each second log sequence of each user identifier to obtain second feature data of each user identifier in each time interval.

In some embodiments, prior to said determining first characteristic data for each user identification and second characteristic data for each user identification within each time interval based on said first log sequence and said second log sequence, said method further comprises:

carrying out one-hot coding on various operation behaviors of the target equipment to obtain a code of each operation behavior;

and processing the code of each operation behavior by using a word vector model to obtain a vector of each operation behavior, wherein the word vector model is obtained by training according to the operation behavior in the log data.

In some embodiments, the performing feature extraction on the matrix of the first log sequence of each user identifier to obtain first feature data of each user identifier includes: performing feature extraction on a matrix of each first log sequence based on a first long-short time memory network self-coding model to obtain corresponding first feature data, wherein the first long-short time memory network self-coding model is used for extracting features of sequences formed by operation behaviors in the first log sequences;

the feature extraction of the matrix of each second log sequence of each user identifier to obtain second feature data of each user identifier in each time interval includes: and performing feature extraction on the matrix of each second log sequence based on a second long-short time memory network self-coding model to obtain corresponding second feature data, wherein the second long-short time memory network self-coding model is used for extracting features of a sequence formed by operation behaviors of the user identifier in the second log sequence in each time interval.

In some embodiments, the first long-term memory network self-coding model comprises: a first coding model and a first decoding model; the training process of the first long-time memory network self-coding model comprises the following steps:

sequentially inputting the matrix of each first log sequence into a first coding model to obtain corresponding first log characteristics;

inputting the first log feature into a first decoding model to obtain a first decoding result;

training a first coding model and a first decoding model according to the difference between the first decoding result and the first log sequence to obtain a trained first long-short term memory network self-coding model;

the second long-short time memory network self-coding model comprises: a second coding model and a second decoding model; the training process of the second long-time memory network self-coding model comprises the following steps:

sequentially inputting the matrix of each second log sequence into a second coding model to obtain corresponding second log characteristics;

inputting the second log feature into a second decoding model to obtain a second decoding result;

and training a second coding model and a second decoding model according to the difference between the second decoding result and the second log sequence to obtain a trained second long-short time memory network self-coding model.

In some embodiments, the determining, according to the first clustering result, a corresponding abnormal user includes:

determining the role type of each class in the first clustering result according to the relationship between the user identification and the role type, wherein the number of the user identifications corresponding to the class role type in the class is the highest;

for any first category in the first clustering results, based on the role type of the first category, determining abnormal users by using at least one of the following modes:

when the role type of the user identifier corresponding to part of the first characteristic data of the first class is different from the role type of the first class, and the proportion of the part of the first characteristic data in the first class is smaller than a first proportion threshold value, and the quantity of the part of the first characteristic data is smaller than a first quantity threshold value, determining that the user corresponding to the part of the first characteristic data is an abnormal user;

when other classes which are the same as the role type of the first class do not exist in the first clustering result, the ratio of the quantity of the first characteristic data of the first class in the total number of the registered user identifications corresponding to the role type of the first class is smaller than a second ratio threshold, and the quantity of the first characteristic data of the first class is smaller than a second quantity threshold, determining that the user corresponding to the first characteristic data of the first class is an abnormal user;

and when the first clustering result has a second class which has the same role type as the first class, and the ratio of the quantity of the first characteristic data of the first class in the total quantity of the first characteristic data of the first class and the second class is less than a third ratio threshold, and the quantity of the first characteristic data of the first class is less than a third quantity threshold, determining that the user corresponding to the first characteristic data of the first class is an abnormal user.

In some embodiments, the determining the corresponding abnormal user according to the second clustering result includes:

acquiring a plurality of set time period category labels, wherein the time period category labels correspond to a plurality of time periods obtained by dividing the time of a day;

determining a time period category label of each class in each second clustering result based on the plurality of time period category labels, wherein the number of the second characteristic data corresponding to the time period category labels of the classes in the classes is the highest;

according to the time period category label of each category in the second clustering result, determining the abnormal user by using at least one of the following modes:

in the second clustering result, when the operation behavior corresponding to the class of the class label in a time period is abnormal frequently, determining that the user corresponding to the class of the class label in the time period is an abnormal user;

and two classes with the same time period class labels exist in the second clustering result, the ratio of the operation behavior frequency of the class with less operation behaviors in the two classes in the total operation behavior frequency of the two classes is smaller than a fourth ratio threshold, and when the number of the second characteristic data in the two classes reaches a fourth quantity threshold, the user corresponding to the class with less operation behaviors is determined to be an abnormal user.

In some embodiments, the above method further comprises:

generating alarm information for abnormal users;

and locking an account of the abnormal user.

In some embodiments, the generating of the alarm information for the abnormal user includes:

generating first alarm information for the abnormal users determined according to the first clustering result, wherein the first alarm information comprises: user identification, role type and statistical result of the operation behavior of the user in the historical time period;

and generating second alarm information for the abnormal users determined according to the second clustering result, wherein the second alarm information comprises: the user identification, the role type, the time period category label and the statistical result of the operation behavior of the user in the time period corresponding to the time period category label.

According to an aspect of the present application, there is provided an abnormal user detecting apparatus including:

the data processing unit is used for acquiring log data of a historical time period, wherein each log in the log data comprises a user identifier, an operation behavior of a user on target equipment and log time; determining a first log sequence of each user identifier and a second log sequence of each user identifier in each time interval based on the log data, wherein the time intervals are the result of dividing the historical time period according to unit time length, the first log sequence of each user identifier is the result of sequencing the logs of the user identifier according to log time, and the second log sequence of each user identifier in each time interval is the result of sequencing the logs of the user identifier in the time interval according to log time;

a feature extraction unit, configured to determine, based on the first log sequence and the second log sequence, first feature data of each user identifier and second feature data of each user identifier in each time interval, where the first feature data is used to characterize a feature of a sequence of operation behaviors in the first log sequence of the user identifier, and the second feature data is used to characterize a feature of a sequence of operation behaviors in the second log sequence of each user identifier in each time interval;

the clustering unit is used for clustering the first characteristic data to obtain a first clustering result, and the first clustering result is used for representing behavior differences among different users; clustering the second characteristic data to obtain a second clustering result, wherein the second clustering result is used for representing behavior differences of the user in different time intervals;

the abnormal analysis unit is used for determining corresponding abnormal users according to the first clustering result; and determining corresponding abnormal users according to the second clustering result.

According to an aspect of the present application, there is provided a computing device comprising:

a memory;

a processor;

a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing an abnormal user detection method according to the present application.

According to an aspect of the present application, there is provided a storage medium storing a program, the program comprising instructions that, when executed by a computing device, cause the computing device to perform an abnormal user detection method according to the present application.

In summary, according to the abnormal user detection scheme of the embodiment of the application, the trouble that the abnormal operation behavior is determined first and then the abnormal user is selected from a plurality of users corresponding to the abnormal operation behavior can be avoided, the log can be automatically divided according to the user identification, the feature data is extracted from the log sequence of the user, and clustering is performed by using the feature data, so that the abnormal user can be accurately determined, and further, the abnormal user can be automatically detected and the discovery efficiency of the abnormal user can be improved. Particularly, in the abnormal user detection scheme of the embodiment of the application, when the feature data is extracted, the behavior difference between different users and the difference of the user operation behavior of the same user in different time intervals are fully considered, so that the abnormal user can be accurately determined from a transverse angle (the angle of the behavior difference between different users) and a longitudinal angle (the angle of the difference of the user operation behavior of the same user in different time intervals), and further the accuracy of abnormal detection on the user is improved and the safety of the security equipment is improved.

Drawings

FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;

FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;

FIG. 3 illustrates a flow diagram of an abnormal user detection method 300 according to some embodiments of the present application;

FIG. 4 illustrates a flow diagram of a method 400 of determining a log sequence according to some embodiments of the present application;

FIG. 5 illustrates a flow diagram of a method 500 of extracting feature data according to some embodiments of the present application;

FIG. 6 illustrates a flow diagram of a method 600 of determining a vector representation of each operational behavior according to some embodiments of the present application;

FIG. 7 illustrates a flow diagram of a method 700 of training a first long-short memory network self-coding model in accordance with some embodiments of the present application;

FIG. 8 illustrates a schematic diagram of training a first long-short memory network self-encoding model according to some embodiments of the present application;

FIG. 9 illustrates a flow diagram of a method 900 of training a second long-short memory network self-coding model according to some embodiments of the present application;

FIG. 10 illustrates a flow diagram of a method 1000 of determining anomalous users from a first clustering result in accordance with some embodiments of the present application;

FIG. 11 illustrates a flow diagram of a method 1100 of determining anomalous users from results of a second clustering in accordance with some embodiments of the present application;

FIG. 12 illustrates a flow diagram of an abnormal user detection method 1200 according to some embodiments of the present application;

FIG. 13 illustrates a flow diagram of a method 1300 of generating alert information for an anomalous user in accordance with some embodiments of the present application;

FIG. 14 illustrates a schematic diagram of an abnormal user detection apparatus 1400 according to some embodiments of the present application;

FIG. 15 shows a schematic diagram of an anomalous user detection device 1500 in accordance with some embodiments of the present application;

FIG. 16 illustrates a schematic diagram of a computing device according to some embodiments of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.

FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.

As shown in FIG. 1, an application scenario may include a plurality of security devices 110, a target device 120, and a database 130. Here, the security device 110 may be, for example, a device such as a network camera (IPC), a hard disk recorder (DNR), and a Network Video Recorder (NVR). The target device 120 is a management platform for the security device 110, and may record log data related to user operation behavior. The log data may be stored, for example, in the database 130. The user operation behavior is, for example, user login and logout to and from the target device 120, video search and image search based on pictures, and the like. In some embodiments, the target device 120 may perform an abnormal user detection method according to the present application.

FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.

As shown in fig. 2, an application scenario may include a plurality of security devices 110, a target device 120, a database 130, and a computing device 140. The security device 110 may be, for example, a network camera (IPC), a hard disk recorder (DNR), a Network Video Recorder (NVR), and the like. The target device 120 is a management platform for the security device 110, and may record log data related to user operation behavior. The log data may be stored, for example, in the database 130. In some embodiments, the computing device 140 may perform an abnormal user detection method according to the present application.

FIG. 3 illustrates a flow diagram of an abnormal user detection method 300 according to some embodiments of the present application. The method 300 may be performed, for example, in the target device 120 of fig. 1 or the computing device 140 of fig. 2.

As shown in fig. 3, in step S301, log data of a history time period is acquired. Each log in the log data comprises a user identification, an operation behavior of the user on the target device and log time. The historical time period is, for example, a past week or a month. The log data may be generated by the target device and stored in the database 130, for example. The operation behavior of the user on the target device may include, for example, the operations of login and logout of the user on the target device, searching for a good image according to a picture, and the like.

In step S302, a first log sequence of each user identifier and a second log sequence of each user identifier within each time interval are determined based on the log data. The time interval is a result of dividing the historical time period according to the unit time length. The unit time period is, for example, one hour, two hours, or one day or the like. The first log sequence of each user identifier is a result of sorting the logs of the user identifier according to the log time. The second log sequence of each user identifier in each time interval is a result of sorting the logs of the user identifier in the time interval according to the log time.

In step S303, based on the first log sequence and the second log sequence, first feature data of each user identifier and second feature data of each user identifier in each time interval are determined.

Wherein the first characteristic data of each user identification is used for characterizing the sequence of operation behaviors in the first log sequence of the user identification. In other words, the first characteristic data of one user identifier may characterize both the operation behavior of the user in the historical period of time and the time-series characteristic of the operation behavior of the user in the historical period of time.

The second characteristic data of each user identification in each time interval is used for characterizing the sequence of the operation behaviors in the second log sequence of each user identification in each time interval. In other words, for the same user, step S303 may represent the second log sequence of different time intervals as different second feature data. The second characteristic data of a time interval of a user identifier can represent the operation behavior of the user in the time interval and the time sequence characteristic of the operation behavior of the user in the time interval.

In step S304, the first feature data is clustered to obtain a first clustering result. In step S304, a clustering operation may be performed by using a clustering method such as K-means (K-means). Here, since the first feature data may represent both the operation behavior of the user in the historical time period and the time-series feature of the operation behavior of the user in the historical time period, the feature of the operation behavior of the user and the time-series feature of the operation behavior may be sufficiently considered when clustering the first feature data in step S304. On the basis, the first clustering result generated in step S304 can fully embody the characteristics of the operation behavior and the time sequence characteristics of the operation behavior. In other words, the first clustering result generated in step S304 can fully reflect the behavior difference between different users.

It is further noted that users can be divided into different role types. For example, the role types can be divided into administrators, high-level users, middle-level users and normal users according to the authority of operation behaviors. The operation behaviors of users with the same role type under normal conditions have similarity, and the operation behaviors of users with different role types have difference. The first clustering result generated in step S304 may embody the difference between the operation behaviors of different users.

In step S305, the second feature data is clustered to obtain a second clustering result. In step S305, clustering operation can be performed by using a clustering method such as K-means. The second characteristic data of a time interval of a user identifier can represent the operation behavior of the user in the time interval and the time sequence characteristic of the operation behavior of the user in the time interval. Therefore, in the clustering of the second feature data in step S305, the features of the operation behaviors of the users in different time intervals and the time-series features of the operation behaviors in different time intervals can be sufficiently considered. On the basis, the first clustering result generated in step S304 can fully reflect the characteristics of the operation behavior in each time interval and the behavior difference of different time intervals of the same user.

In step S306, according to the first clustering result, a corresponding abnormal user is determined. Here, since the first clustering result can sufficiently reflect the behavior difference between different users, step S306 can accurately locate the abnormal user by using the behavior difference between different users. For example, administrators and general users behave differently under normal circumstances. If the behavior of the general user is the same as that of the administrator, the general user has suspicion of abnormal operation, and step S306 may determine whether the general user is an abnormal user.

In step S307, according to the second clustering result, a corresponding abnormal user is determined. Since the second clustering result can fully reflect the difference of the user operation behaviors in different time intervals, step S307 can accurately locate the abnormal user with abnormal behavior in the specific time period by using the difference of the user operation behaviors in different time intervals.

In summary, according to the abnormal user detection scheme of the embodiment of the application, the trouble that the abnormal operation behavior is determined first and then the abnormal user is selected from a plurality of users corresponding to the abnormal operation behavior can be avoided, the log can be automatically divided according to the user identification, the feature data is extracted from the log sequence of the user, and clustering is performed by using the feature data, so that the abnormal user can be accurately determined, and the abnormal user can be automatically detected and the discovery efficiency of the abnormal user can be improved. Particularly, in the abnormal user detection scheme of the embodiment of the application, when the feature data is extracted, the behavior difference between different users and the difference of the user operation behavior of the same user in different time intervals are fully considered, so that the abnormal user can be accurately determined from a transverse angle (the angle of the behavior difference between different users) and a longitudinal angle (the angle of the difference of the user operation behavior of the same user in different time intervals), and further the accuracy of abnormal detection on the user is improved and the safety of the security equipment is improved.

In some embodiments, step S302 may be implemented as method 400.

As shown in fig. 4, in step S401, the log data is divided according to the user identifier, and the log of each user identifier is determined, that is, the log set of each user in the history time period is determined.

In step S402, the logs of each user identifier are sorted in time order, and a first log sequence of each user identifier is obtained.

In step S403, the first log sequence of each user identifier is divided into time intervals, and a second log sequence of each user identifier in each time interval is determined. For example, the historical time period is one month, and the time interval is a time period in hours within one month. Step S403 may determine a log sequence for each user within each hour.

In summary, the method 400 can acquire the overall ranking result of each user identifier in the historical time period and the ranking result of the user identifier in a single time interval, so that the embodiment of the present application extracts feature data from the overall ranking result and extracts feature data from the ranking result in the single time interval.

In some embodiments, step S303 may be implemented as method 500.

In step S501, the sequence of the operation behaviors in each second log sequence of each user identifier is represented as a matrix using the vector of the operation behaviors to the target device.

In step S502, a matrix of the first log sequence of each user identity is determined from the matrix of each second log sequence of each user identity. The matrix of the first log sequence is a vector representation of a sequence of operational behaviors in the first log sequence.

In step S503, feature extraction is performed on the matrix of the first log sequence of each user identifier, so as to obtain first feature data of each user identifier.

In step S504, feature extraction is performed on the matrix of each second log sequence of each user identifier, so as to obtain second feature data of each user identifier in each time interval.

In summary, the method 500 according to the embodiment of the present application can represent the sequence of the operation behaviors in the log sequence as a matrix, and extract the feature data of the sequence of the operation behaviors based on the matrix.

In some embodiments, embodiments of the present application may determine a vector representation of each operation behavior by method 600. Here, the method 600 may be performed before the method 300 is performed, or may be performed before the step S303 is performed, which is not limited in this application.

As shown in fig. 6, in step S601, the various operation behaviors of the target device are subjected to one-hot (one-hot) encoding, and an encoding for each operation behavior is obtained. One-hot encoding, which may also be referred to as one-bit-efficient encoding, is an encoding that replaces the smallest atom in the data with a vector of length N consisting of 0 and 1. One operation behavior is encoded with one bit being 1 and the remaining bits being 0.

For example, the total number of types of operational behaviors of the target device is N. N is a positive integer. The encoding of all operational behaviors of the target device may be represented as a matrix. An example matrix is as follows:

Action_ALL_OneHot=

action _ ALL _ OneHot represents a matrix formed by encoding ALL operation behaviors of the target device, the matrix is N x N in size, each row is a vector with the length of N and only one value of 1, and each row corresponds to encoding of one operation behavior. The matrix has a total of N rows, corresponding to N operation behaviors.

In step S602, the encoding of each operation behavior is processed by using the word vector model, and a vector of each operation behavior is obtained. Here, the word vector (word 2 vec) model is trained from the operation behavior in the log data. Specifically, the embodiment of the application may use an operation behavior in log data as a sample, and train a word vector model by using a Continuous Bag-of-words (abbreviated as CBOW) model to obtain a trained word vector model.

In summary, the method 600 can effectively overcome the disadvantages that the one-hot coding loses the internal sequence information of the data and is easy to cause data disaster by processing the one-hot coding of the operation behavior by using the word vector model. Method 600 can represent each operational behavior as a fixed-length vector length1, for example, length1=64, by processing the one-hot encoding of the operational behavior using a word vector model. In this way, the method 600 may reduce the vector length of the operation behavior by obtaining the vector representation of the operation behavior by using the word vector model, thereby improving the calculation efficiency of subsequent data processing (e.g., operation for extracting feature data), and further improving the discovery efficiency of abnormal users.

In some embodiments, step S503 may perform feature extraction on the matrix of each first log sequence based on a first long-term memory network self-encoding (LSTM-AE) model, to obtain corresponding first feature data. The first long-short time memory network self-coding model is used for extracting the characteristics of a sequence formed by operation behaviors in the first log sequence. The embodiment of the application can train the first long-short term memory network self-coding model through the method 700. The method 700 may be performed, for example, before step S303. Fig. 8 is a schematic diagram of a first long-short term memory network self-coding model. The first long-time memory network self-coding model comprises the following steps: a first coding model and a first decoding model.

As shown in fig. 7, in step S701, the matrix of each first log sequence is sequentially input into the first coding model, and the corresponding first log feature is obtained. Here, the matrix of the first log sequence of all users can be represented as a matrix, for example

。

，

Wherein the content of the first and second substances,

each row in the array corresponds to a sample, and one sample is a matrix of a first log sequence of a user and comprises a plurality of operation behaviors. For example,

=

the scale is frequency length1, frequency is the number of rows, which represents the number of times of an operation behavior, and length1 represents the length of a vector corresponding to an operation behavior.

Step S701 may take the output of the last neuron 801 of the first coding model in fig. 8 as the first log feature.

In step S702, a first log feature is input to a first decoding model to obtain a first decoding result. The first decoding result is a matrix based on the first log feature pair

And (4) restoring the data. As shown in FIG. 8, A₁To A_nThe data reduction result of (A) is in turn₁ ^’To A_n ^’。

In step S703, a first coding model and a first decoding model are trained according to a difference between the first decoding result and the first log sequence, so as to obtain a trained first long-time and short-time memory network self-coding model. In some embodiments, step S703 may define a loss function and a gradient descent optimization function (e.g., selecting mean square error and random gradient descent methods). On the basis, the mean square error between the first decoding result and the first log sequence is continuously reduced through a random gradient descent method, and finally a loss function is converged to obtain a trained first long-time and short-time memory network self-coding model.

In summary, the method 700 can train the first long-short term memory network self-coding model by using the operation behavior sequences of the multiple users, so that the embodiment of the present application extracts the first feature data by using the first long-short term memory network self-coding model.

In addition, in step S503, by performing feature extraction on the matrix of the first log sequence of different users by using LSTM-AE, the matrix of the first log sequence of indefinite length (that is, the number of times of operation actions in different matrices may be different) can be processed into feature data of a fixed length, thereby facilitating accurate clustering of the feature data. In addition, LSTM-AE utilized in step S503 uses a masking mechanism. The mechanism can automatically fill 0 data for data with insufficient length, but automatically skips the filled 0 data during operation, so that data information deviation caused by filling 0 data is avoided, and the integrity and consistency of the data are ensured to the maximum extent. In addition, step S503 is based on the LSTM-AE model, and can compress data dimensions in the encoding process, extract data features, expand the restored data from the data features through the decoding process, and train model parameters by comparing the difference between the original data and the restored data, so that the trained model can extract the first log features from the unlabeled data. Step S503 performs feature extraction on the matrix of the first log sequence of different users by using LSTM-AE, so that the embodiment of the present application does not need the data tag of the exemplar. Therefore, the embodiment of the application has wider applicability to unsupervised data abnormal user detection and extremely strong tolerance to sample data.

In some embodiments, step S504 may perform feature extraction on the matrix of each second log sequence based on the trained second long-and-short memory network self-coding model, so as to obtain corresponding second feature data. And the second long-short time memory network self-coding model is used for extracting the characteristics of the sequence formed by the operation behaviors of the user identifier in the second log sequence in each time interval. The embodiment of the application may train the second long-and-short memory network self-coding model through the method 900. The second long-short time memory network self-coding model comprises the following steps: a second coding model and a second decoding model.

As shown in fig. 9, in step S901, the matrix of each second log sequence is sequentially input into the second coding model, so as to obtain corresponding second log features.

In step S902, the second log feature is input into the second decoding model to obtain a second decoding result.

In step S903, a second coding model and a second decoding model are trained according to a difference between a second decoding result and a second log sequence, so as to obtain a trained second long-short time memory network self-coding model. More specific implementations of method 900 are similar to method 700 and will not be described in detail herein.

It is further noted that, in some embodiments, when performing K-means clustering on the first feature data, step S304 may add 1 to the number of character types as the classification number of the K-means algorithm. Thus, the first clustering result of step S304 may appear in two classes with the same role type. For example, role types include A, B and C. The first clustering result includes 4 classes. The role types of the 4 classes are A, B, C and A, respectively. The role types of the first and fourth classes are the same. One of the two classes with the same role type has abnormal suspicion. Two classes of the same role type can be used to determine anomalous users.

Step S306 may determine an abnormal user according to the first clustering result. In some embodiments, step S306 may be implemented as method 1000.

As shown in fig. 10, in step S1001, the role type of each class in the first clustering result is determined according to the relationship between the user identifier and the role type. The role type of a class has the highest percentage of the number of corresponding user identifications in the class. For example, role types a and B are included in one class, and if the number of the user identifiers corresponding to the role type a is the highest, the role type of the class is a.

In step S1002, for any first category of the first clustering results, an abnormal user is determined based on the role type of the first category.

Step S1002 may determine the abnormal user using at least one of the following ways.

In some embodiments, when the role type of the user identifier corresponding to the part of the first feature data of the first class is different from the role type of the first class, the proportion of the part of the first feature data in the first class is smaller than a first proportion threshold, and the number of the part of the first feature data is smaller than a first number threshold, step S1002 determines that the user corresponding to the part of the first feature data is an abnormal user. The first duty threshold is, for example, 0.1. The first number threshold is for example 3.

In some embodiments, when there is no other class of the first clustering result that is the same as the role type of the first class, the ratio of the number of the first feature data of the first class (the number of the user identifiers corresponding to the first class) to the total number of the registered user identifiers corresponding to the role type of the first class is smaller than the second ratio threshold, and the number of the first feature data of the first class is smaller than the second quantity threshold, step S1002 determines that the user corresponding to the first feature data of the first class is an abnormal user. The second duty threshold is, for example, 0.1. The second number threshold is for example 3.

In some embodiments, when the first clustering result has a second class with the same role type as the first class, and the number of the first feature data of the first class in the total number of the first feature data of the first class and the second class is smaller than a third proportion threshold, and the number of the first feature data of the first class is smaller than a third number threshold, step S1002 determines that the user corresponding to the first feature data of the first class is an abnormal user. The third duty threshold is, for example, 0.1. The third number threshold is, for example, 3. In short, in two classes (i.e., a first class and a second class) with the same role type, step S1002 may regard a class with a smaller number of first feature data as an abnormal class, and further determine that a user corresponding to the first feature data in the class is an abnormal user.

In some embodiments, the step S305 may set a plurality of time period category labels when K-means clustering is performed on the second feature data. For example, the plurality of time period category labels correspond to a plurality of time periods divided by time of day, and the plurality of time period category labels include LabelA 1:00-9:00, LabelB 9:00-17:00, and LabelC 17:00-01: 00.

For another example, the plurality of time period category labels correspond to a plurality of time periods divided by the time of the week, including a Label 1 work day and a Label 2 rest day.

The total number of categories for K-means may be set to the number of time period category labels plus 1. Thus, two classes with the same time period class label will appear in the second clustering result. The time period category labels of the two categories are the same, but the operation behaviors of the representations are different, so that the time period category labels can be used for determining abnormal users. In some embodiments, step S307 may be implemented as method 1100.

As shown in fig. 11, in step S1101, a plurality of set time zone category tags are acquired. The plurality of time period category labels correspond to a plurality of time periods divided by time of day. For example, the plurality of time period category labels include LabelA 1:00-9:00, LabelB 9:00-17:00, LabelC 17:00-01: 00.

In step S1102, a time period category label for each class in each second clustering result is determined based on the plurality of time period category labels. And the time period category label of one class has the highest number ratio of the corresponding second characteristic data in the class.

In step S1103, an abnormal user is determined according to the time period category label of each category in the second clustering result.

Step S1103 may determine the abnormal user by using at least one of the following manners.

In some embodiments, in the second clustering result, when the operation behavior corresponding to the class of the one time period class label is abnormal frequently, step S1103 may determine that the user corresponding to the class of the one time period class label is an abnormal user. For example, when the number of times of user operation behavior in the time period corresponding to the label > = (mean value of the number of times of user operation in the label b time period + mean value of the number of times of user operation in the label c time period)/2, the step S1103 determines that the user operation in the time period corresponding to the label a is abnormally frequent. The user operation in the time period corresponding to Labela can be frequently considered as a large amount of operation performed in the middle of the night, and the fact that a 'ghost' user or an account is stolen can be possibly caused.

In some embodiments, when there are two classes with the same time period class label in the second clustering result, the ratio of the operation times of the class with the smaller operation behavior in the two classes to the total operation times of the two classes is smaller than a fourth ratio threshold, and the number of the second feature data in the two classes reaches a fourth number threshold, step S1103 may determine that the user corresponding to the class with the smaller operation behavior is an abnormal user. The fourth ratio threshold is, for example, 0.01, and the fourth number threshold is, for example, 100. In short, in two classes with the same time period class label, step S1103 may regard the class with less operation behavior as an abnormal class, and further determine the user corresponding to the class as an abnormal user.

In summary, the method 1100 may accurately determine the abnormal user according to the features of the operation behaviors characterized by different classes in the second clustering result.

FIG. 12 illustrates a flow diagram of an abnormal user detection method 1200 according to some embodiments of the present application. The method 1200 may be performed, for example, in the target device 120 of fig. 1 or the computing device 140 of fig. 2.

As shown in fig. 12, in step S1201, log data of a history time period is acquired. Each log in the log data comprises a user identification, an operation behavior of the user on the target device and log time.

In step S1202, a first log sequence of each user identifier and a second log sequence of each user identifier within each time interval are determined based on the log data. The time interval is a result of dividing the historical time period according to the unit time length. The unit time period is, for example, one hour, two hours, or one day or the like. The first log sequence of each user identifier is a result of sorting the logs of the user identifier according to the log time. The second log sequence of each user identifier in each time interval is a result of sorting the logs of the user identifier in the time interval according to the log time.

In step S1203, first feature data of each user identifier and second feature data of each user identifier in each time interval are determined based on the first log sequence of each user identifier and the second log sequence of each user identifier in each time interval.

The second characteristic data of each user identification in each time interval is used for characterizing the sequence of the operation behaviors in the second log sequence of each user identification in each time interval. In other words, step S1203 may represent the second log sequence of different time intervals as different second feature data for the same user. The second characteristic data of a time interval of a user identifier can represent the operation behavior of the user in the time interval and the time sequence characteristic of the operation behavior of the user in the time interval.

In step S1204, the first feature data is clustered to obtain a first clustering result. Step S1204 may perform clustering operation using a clustering method such as K-means.

Here, since the first feature data may represent both the operation behavior of the user in the historical time period and the time-series feature of the operation behavior of the user in the historical time period, the feature of the operation behavior of the user and the time-series feature of the operation behavior may be sufficiently considered when clustering the first feature data in step S1204. On the basis, the first clustering result generated in step S1204 can sufficiently embody the characteristics of the operation behavior and the time-series characteristics of the operation behavior. In other words, the first clustering result generated in step S1204 may fully embody the behavior differences between different users.

Stated otherwise, users are typically divided into different role types. For example, the role types can be divided into administrators, high-level users, middle-level users and normal users according to the authority of operation behaviors. The operation behaviors of users with the same role type under normal conditions have similarity, and the operation behaviors of users with different role types have difference. The first clustering result generated in step S1204 may embody a difference between operation behaviors of different users. Here, the operation behaviors of the users in the same class in the first clustering result have higher similarity, and the operation behaviors of the users in different classes have more difference.

In step S1205, the second feature data is clustered to obtain a second clustering result. In step S1205, clustering operation can be performed by using a clustering method such as K-means. The second characteristic data of a time interval of a user identifier can represent the operation behavior of the user in the time interval and the time sequence characteristic of the operation behavior of the user in the time interval. Therefore, in the step S1205, when clustering the second feature data, the features of the operation behaviors of the users in different time intervals and the time-series features of the operation behaviors in different time intervals can be fully considered. On the basis, the first clustering result generated in step S1204 can fully embody the characteristics of the operation behavior in each time interval and the behavior difference of different time intervals of the same user.

In step S1206, according to the first clustering result, a corresponding abnormal user is determined. Here, since the first clustering result can fully reflect the behavior differences among different users, step S1206 can accurately locate the abnormal user by using the behavior differences among different users.

In step S1207, according to the second clustering result, the corresponding abnormal user is determined. Since the second clustering result can fully reflect the difference of the user operation behaviors in different time intervals, step S1207 can accurately locate the abnormal user with abnormal behavior in the specific time period by using the difference of the user operation behaviors in different time intervals.

In step S1208, warning information for an abnormal user is generated.

In step S1209, account locking is performed for the abnormal user.

In summary, the method 1200 can generate warning information and perform account locking after determining an abnormal user, so as to improve the information security of the target device.

In some embodiments, step S1208 may be implemented as method 1300.

As shown in fig. 13, in step S1301, first warning information is generated for an abnormal user determined according to the first clustering result. The first warning information includes: user identification, role type, and statistical result of the user's behavior in historical time period. Here, the statistical result may include, for example, the category of the operation behavior, the operation frequency corresponding to each category, and the maximum operation frequency of the user in each time interval and the corresponding time.

In step S1302, second warning information is generated for the abnormal user determined according to the second clustering result. The second warning information includes: the user identification, the role type, the time period category label and the statistical result of the operation behavior of the user in the time period corresponding to the time period category label.

In some embodiments, the lock-out time of step S1209 may be set to 3 days, for example. And if the user statement is false, the super administrator unseals. If the number of false alarms is large, the threshold values in the method 1000 and the method 1100 can be adjusted according to user input in the embodiment of the present application, so as to ensure normal operation of the system.

Fig. 14 illustrates a schematic diagram of an abnormal user detection apparatus 1400 according to some embodiments of the present application. The apparatus 1400 may be deployed, for example, in the target device 120 of fig. 1 or the computing device 140 of fig. 2.

The abnormal user detection apparatus 1400 includes: a data processing unit 1401, a feature extraction unit 1402, a clustering unit 1403, and an abnormality analysis unit 1404.

The data processing unit 1401 acquires log data of a history period. Each log in the log data comprises a user identification, an operation behavior of the user on the target device and log time. Based on the log data, the data processing unit 1401 determines a first log sequence of each user identification and a second log sequence of each user identification within each time interval. The time interval is a result of dividing the historical time period according to the unit time length. The first log sequence of each user identifier is a result of sorting the logs of the user identifier by log time. The second log sequence of each user identifier in each time interval is a result of sorting the logs of the user identifier in the time interval according to the log time.

The feature extraction unit 1402 determines first feature data of each user identifier and second feature data of each user identifier in each time interval based on the first log sequence and the second log sequence. The first characteristic data of each subscriber identity is used to characterize a sequence of operating actions in a first log sequence of the subscriber identity. The second characteristic data of each user identification in each time interval is used for characterizing the sequence of the operation behaviors in the second log sequence of each user identification in each time interval.

The clustering unit 1403 clusters the first feature data to obtain a first clustering result. The first clustering result is used for representing behavior differences among different users. And a clustering unit 1403, which clusters the second feature data to obtain a second clustering result. The second clustering result is used for representing the behavior difference of the user in different time intervals.

The anomaly analysis unit 1404 determines a corresponding anomalous user according to the first clustering result. The anomaly analysis unit 1404 determines a corresponding anomalous user from the second clustering result. More specific embodiments of the apparatus 1400 are similar to the method 300, and are not described herein again.

In summary, according to the abnormal user detection apparatus 1400 of the embodiment of the present application, the trouble of determining the abnormal operation behavior first and then selecting the abnormal user from the multiple users corresponding to the abnormal operation behavior can be avoided, the log can be automatically divided according to the user identifier, the feature data is extracted from the log sequence of the user, and clustering is performed by using the feature data, so that the abnormal user can be accurately determined, and the abnormal user can be automatically detected and the discovery efficiency of the abnormal user can be improved. Specifically, the abnormal user detection apparatus 1400 according to the embodiment of the present application may fully consider the behavior difference between different users and the difference between the user operation behaviors of the same user in different time intervals when extracting the feature data, so that the abnormal user may be accurately determined from a transverse angle (an angle of the behavior difference between different users) and a longitudinal angle (an angle of the difference between the user operation behaviors of the same user in different time intervals), thereby improving the accuracy of performing the abnormal detection on the user.

FIG. 15 shows a schematic diagram of an abnormal user detection apparatus 1500 according to some embodiments of the present application. The apparatus 1500 may be deployed, for example, in the target device 120 of fig. 1 or the computing device 140 of fig. 2.

The abnormal user detecting apparatus 1500 includes: a data processing unit 1501, a feature extraction unit 1502, a clustering unit 1503, an anomaly analysis unit 1504, and an alarm processing unit 1505.

The data processing unit 1501 acquires log data of a history period. Each log in the log data comprises a user identification, an operation behavior of the user on the target device and log time. Based on the log data, the data processing unit 1501 determines a first log sequence of each user identifier and a second log sequence of each user identifier in each time interval. The time interval is a result of dividing the historical time period according to the unit time length. The first log sequence of each user identifier is a result of sorting the logs of the user identifier by log time. The second log sequence of each user identifier in each time interval is a result of sorting the logs of the user identifier in the time interval according to the log time.

The feature extraction unit 1502 determines first feature data of each user identifier and second feature data of each user identifier in each time interval based on the first log sequence and the second log sequence. The first characteristic data of each subscriber identity is used to characterize a sequence of operating actions in a first log sequence of the subscriber identity. The second characteristic data of each user identification in each time interval is used for characterizing the sequence of the operation behaviors in the second log sequence of each user identification in each time interval.

The clustering unit 1503 clusters the first feature data to obtain a first clustering result. The first clustering result is used for representing behavior differences among different users. The clustering unit 1503 clusters the second feature data to obtain a second clustering result. The second clustering result is used for representing the behavior difference of the user in different time intervals.

And an abnormal analysis unit 1504, determining the corresponding abnormal user according to the first clustering result. The exception analysis unit 1504 determines a corresponding exception user according to the second clustering result.

The alarm processing unit 1505 may generate alarm information for an abnormal user and perform account locking for the abnormal user.

In summary, the apparatus 1500 can generate warning information and perform account locking after determining an abnormal user, so as to improve the information security of the target device.

FIG. 16 illustrates a schematic diagram of a computing device according to some embodiments of the present application. As shown in fig. 16, the computing device includes one or more processors (CPUs) 1602, a communications module 1604, a memory 1606, a user interface 1610, and a communication bus 1608 for interconnecting these components.

The processor 1602 can receive and transmit data via the communication module 1604 to enable network communications and/or local communications.

The user interface 1610 includes one or more output devices 1612 including one or more speakers and one or more screens. The user interface 1610 also includes one or more input devices 1614. The user interface 1610 may be, for example, a button, but is not limited thereto.

Memory 1606 may be high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.

The memory 1606 stores a set of instructions executable by the processor 1602, including:

an operating system 1616, including programs for handling various basic system services and for performing hardware related tasks;

applications 1618, including various programs for implementing the above-described schemes. Such a program can implement the process flow in the examples described above, and may include, for example, the abnormal user detection method 300.

In addition, each of the embodiments of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that the data processing program constitutes the invention. In addition, a data processing program usually stored in a storage medium is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present invention. The storage medium may use any type of recording means, such as a paper storage medium (e.g., paper tape, etc.), a magnetic storage medium (e.g., a flexible disk, a hard disk, a flash memory, etc.), an optical storage medium (e.g., a CD-ROM, etc.), a magneto-optical storage medium (e.g., an MO, etc.), and the like.

The present application thus also discloses a non-volatile storage medium in which a program is stored. The program comprises instructions which, when executed by a processor, cause a computing device to perform the anomalous user detection method according to the application.

In addition, the method steps described in this application may be implemented by hardware, for example, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers, embedded microcontrollers, and the like, in addition to data processing programs. Therefore, the hardware which can implement the method for determining the relationship information between the objects described in the present application can also form the present application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of the present application.

Claims

1. An abnormal user detection method, comprising:

determining first characteristic data of each user identifier and second characteristic data of each user identifier in each time interval based on the first log sequence and the second log sequence, wherein the first characteristic data represents the operation behaviors of the user in a historical time interval and the time sequence characteristics of the operation behaviors of the user in the historical time interval, and the second characteristic data represents the operation behaviors of the user in the corresponding time interval and the time sequence characteristics of the operation behaviors of the user in the time interval;

2. The abnormal user detection method of claim 1, wherein said determining a first log sequence for each user identification and a second log sequence for each user identification within each time interval based on said log data comprises:

3. The abnormal user detection method of claim 1, wherein said determining first characteristic data for each user identification and second characteristic data for each user identification in each time interval based on the first log sequence and the second log sequence comprises:

4. The abnormal user detection method of claim 3, wherein prior to said determining first characteristic data for each user identification and second characteristic data for each user identification within each time interval based on the first log sequence and the second log sequence, the abnormal user detection method further comprises:

5. The abnormal user detection method of claim 3,

the feature extraction of the matrix of the first log sequence of each user identifier to obtain first feature data of each user identifier includes: performing feature extraction on a matrix of each first log sequence based on a first long-short time memory network self-coding model to obtain corresponding first feature data, wherein the first long-short time memory network self-coding model is used for extracting features of sequences formed by operation behaviors in the first log sequences;

6. The abnormal user detection method of claim 5, wherein the first long-term memory network self-coding model comprises: a first coding model and a first decoding model; the training process of the first long-time memory network self-coding model comprises the following steps:

7. The abnormal user detection method of claim 1, wherein the determining the corresponding abnormal user according to the first clustering result comprises:

8. The abnormal user detection method of claim 1, wherein the determining the corresponding abnormal user according to the second clustering result comprises:

9. An abnormal user detection apparatus, comprising:

the characteristic extraction unit is used for determining first characteristic data of each user identifier and second characteristic data of each user identifier in each time interval on the basis of the first log sequence and the second log sequence, wherein the first characteristic data represents the operation behaviors of the user in a historical time interval and the time sequence characteristics of the operation behaviors of the user in the historical time interval, and the second characteristic data represents the operation behaviors of the user in the corresponding time interval and the time sequence characteristics of the operation behaviors of the user in the time interval;

10. A computing device, comprising:

a memory;

a processor;

a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the abnormal user detection method of any of claims 1-8.

11. A storage medium storing a program, the program comprising instructions that, when executed by a computing device, cause the computing device to perform the abnormal user detection method of any one of claims 1-8.