CN109495479A - A kind of user's abnormal behaviour recognition methods and device - Google Patents
A kind of user's abnormal behaviour recognition methods and device Download PDFInfo
- Publication number
- CN109495479A CN109495479A CN201811386060.6A CN201811386060A CN109495479A CN 109495479 A CN109495479 A CN 109495479A CN 201811386060 A CN201811386060 A CN 201811386060A CN 109495479 A CN109495479 A CN 109495479A
- Authority
- CN
- China
- Prior art keywords
- command
- word frequency
- keyword
- historical
- records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 206010000117 Abnormal behaviour Diseases 0.000 title claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000009467 reduction Effects 0.000 claims abstract description 20
- 230000006399 behavior Effects 0.000 claims description 62
- 230000002159 abnormal effect Effects 0.000 claims description 60
- 239000011159 matrix material Substances 0.000 claims description 45
- 238000000605 extraction Methods 0.000 claims description 2
- 238000011946 reduction process Methods 0.000 claims description 2
- 230000003542 behavioural effect Effects 0.000 abstract 1
- 230000015654 memory Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/101—Access control lists [ACL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
This application provides a kind of user's abnormal behaviour recognition methods and devices, are related to computer security technical field, which comprises obtain multiple and different history command records of the user with identical behavior;Any history command record includes multiple orders;According to the corresponding behavioural characteristic of user, the keyword order in multiple history command records is extracted;According to keyword order in the inverse document word frequency of word frequency and the keyword order in all history commands record in affiliated history command record, the words-frequency feature of keyword order is determined;According to the words-frequency feature of keyword order, exception history command record is determined.Keyword order by characterizing user behavior characteristics in the history command record of user carries out the identification of user's abnormal behaviour, can be realized and identifies to user's abnormal behaviour with variability and complexity.Also, by the way of carrying out dimensionality reduction and clustering processing to the words-frequency feature of keyword order, recognition speed is fast and recognition accuracy is higher.
Description
Technical Field
The application relates to the technical field of computer security, in particular to a method and a device for identifying abnormal behaviors of a user.
Background
In an actual computer network system, there are generally a plurality of legitimate users, and legitimate users can perform business operations in the computer network system based on their own operation authorities. In general, it is necessary to monitor the behaviors of legitimate users in a computer network system and detect whether there is an abnormal behavior, so as to prevent other users from falsely using the accounts of the legitimate users to perform illegal operations.
At present, a method for detecting whether there is an abnormal behavior in a computer network system is as follows: usually, a rule table containing many illegal and sensitive words is first formulated, and abnormal behaviors in user behaviors are identified by matching the user behaviors with rules of the rule table. However, user behavior is diverse and complex, i.e., user behavior may vary with work content, user interests, work hours, and other uncertainty factors. Therefore, the traditional method based on the hard rule table cannot accurately monitor the abnormal behavior of the user and cannot meet the actual requirement.
Disclosure of Invention
In view of this, an object of the embodiments of the present application is to provide a method and an apparatus for identifying an abnormal user behavior, which can identify an abnormal user behavior with variability and complexity, and have high identification accuracy.
In a first aspect, an embodiment of the present application provides a method for identifying an abnormal behavior of a user, where the method is applied to a server, and the method includes:
acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
extracting a plurality of keyword commands in the historical command records according to the behavior characteristics corresponding to the user;
determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all historical command records;
and determining an abnormal historical command record according to the word frequency characteristics of the keyword command.
With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where the determining, according to the word frequency of the keyword command in the corresponding history command record and the word frequency of the inverse document of the keyword command in all history command records, the word frequency feature of the keyword command includes:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
With reference to the first possible implementation manner of the first aspect, this application provides a second possible implementation manner of the first aspect, where the determining, according to the total number of the history command records and the number of the history command records including the keyword command, an inverse document word frequency of the keyword command includes:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where the determining an abnormal historical command record according to the word frequency feature of the keyword command includes:
performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix;
and clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where the performing dimension reduction processing on the word frequency features of the keyword command to obtain a comprehensive word frequency feature matrix includes:
normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix;
and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in an LAS dimensionality reduction algorithm for implicit semantic analysis to obtain a comprehensive word frequency characteristic matrix.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present application provides a fifth possible implementation manner of the first aspect, where the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record includes:
based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients to be determined to be abnormal;
and if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
In combination with the fifth possible implementation manner of the first aspect, the present application provides a sixth possible implementation manner of the first aspect, wherein,
the clustering processing is performed on the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record, and the method further comprises the following steps:
if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value;
clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain a historical command record and a contour coefficient of the undetermined abnormity;
if the contour coefficient is detected to be smaller than the first preset threshold value, the step of updating the original input parameter value through a grid searching method is returned until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
In a second aspect, an embodiment of the present application further provides an apparatus for identifying an abnormal behavior of a user, where the apparatus includes:
the acquisition module is used for acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
the extraction module is used for extracting the keyword commands in the plurality of historical command records according to the behavior characteristics corresponding to the user;
the determining module is used for determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all the historical command records;
the determining module is further configured to determine an abnormal historical command record according to the word frequency feature of the keyword command.
With reference to the second aspect, an embodiment of the present application provides a first possible implementation manner of the second aspect, where the determining module is specifically configured to:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
With reference to the first possible implementation manner of the second aspect, an embodiment of the present application provides a second possible implementation manner of the second aspect, where the determining module is specifically configured to:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein,
IDF (x) represents the word frequency of the inverse document, K represents the total number of the historical command records, and K (x) represents the number of the historical command records containing the keyword command in all the historical command records.
According to the method and the device for identifying the abnormal user behaviors, the abnormal user behaviors are identified through the historical command records of the user, and the abnormal user behaviors with variability and complexity can be identified. Meanwhile, in the embodiment of the application, the abnormal behavior detection of the user is realized by performing dimension reduction processing and clustering processing on the word frequency characteristics of the keyword command for characterizing the behavior characteristics of the user, and the identification speed is high and the identification accuracy is high.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 shows a flowchart of a method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 2 shows a flowchart of another method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 3 shows a flowchart of another method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 4 shows a flowchart of another method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 5 shows a schematic structural diagram of an apparatus for identifying an abnormal behavior of a user according to an embodiment of the present application.
Fig. 6 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
In consideration of the fact that the conventional hard rule table-based method cannot accurately monitor the abnormal behavior of the user, the embodiment of the application provides a method and a device for identifying the abnormal behavior of the user, which are described below by an embodiment.
As shown in fig. 1, a first embodiment of the present application provides a method for identifying an abnormal user behavior, which is applied to a server, and the method includes:
s101, acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands.
In the embodiment of the application, when a user logs in, the server can detect whether the user currently logged in is legal or not, and if so, the legal user is allowed to log in; if the user is illegal, the illegal user is refused to log in. As an optional implementation, the method for the server to detect whether the currently logged-in user is legal is as follows: the server detects whether the user information of the user is in a preset white list or not, and if so, the user is determined to be legal. Here, the white list stores user information of a legitimate user.
As an optional implementation manner, the server is a Linux server, on a Linux platform, Shell is the most important communication interface between a terminal user and an operating system, and a large proportion of user activities are completed by Shell commands. Thus, the Shell command can directly reflect the behavior characteristics of the user.
In general, the user behaviors corresponding to different services are different. For the same service, the behavior characteristics corresponding to different operation authorities and operation habits of the user are different. Therefore, in the embodiment of the application, historical command records of users operating the same service or the same department or the same user at different times are used. Wherein, the users have the same operation behavior.
Here, the above-mentioned history command record includes a plurality of commands and the same command may be repeated. The command is a Shell command, and abnormal operation in user behavior is identified based on the Shell command of the user in the embodiment of the application. The Shell command in the historical command record and the word frequency characteristics corresponding to the Shell command can represent the behavior characteristics of the user.
And S102, extracting a plurality of keyword commands in the historical command records according to the behavior characteristics corresponding to the user.
In the embodiment of the application, all the shell history command records of the user logging in the Linux server each time are used as sample data, and the shell history command records are text commands, so word segmentation processing is firstly carried out on the shell history command records. Here, the word segmentation processing refers to extracting a keyword command from a text command.
As a specific implementation manner, for each user with the same behavior, a keyword command corresponding to the behavior feature of the user is extracted from a history command record corresponding to the user according to the behavior features such as the service information, the operation authority, the operation habit and the like operated by the user.
S103, determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all the historical command records.
Here, the Term Frequency characteristic of the keyword command is Term Frequency-inverse document Frequency (TF-IDF). Wherein, TF refers to word frequency, which indicates how frequently a keyword command appears in one log-in (i.e., one history command record). If a keyword command frequently appears in the log-in (and also the associated historical command record), then the business patterns that can represent the log-in are compared (e.g., the word "vim" frequently appears in a log-in, indicating that it is likely that the file will be modified).
IDF refers to inverse document word frequency, which represents how infrequent keyword commands occur in all logins, if a keyword command occurs frequently in each login, such as the words "cd", "cp", etc., these words are relatively unimportant; if a keyword command does not occur frequently in each entry, such as the words "ssh", "vim", etc., these words are relatively more important, i.e., they can better distinguish between different services in each entry.
In the embodiment of the application, TF and IDF are respectively determined according to the user operation frequency corresponding to the keyword command, and then word frequency characteristics (namely TF-IDF) are determined according to TF and IDF.
And S104, determining an abnormal historical command record according to the word frequency characteristics of the keyword command.
In the embodiment of the application, the keyword command corresponding to each historical command record and the word frequency characteristic of each keyword command are combined into a word frequency characteristic matrix, and then the word frequency characteristic matrix is clustered to obtain abnormal historical command records. Since the user information is also included in the abnormal historical command record, the abnormal behavior of the user is recognized by the abnormal historical command record mode.
Here, the abnormal behavior of the user may refer to an attack behavior impersonated as a legitimate user, or may refer to a corresponding normal behavior after the legitimate user changes an operation habit or changes a corresponding service characteristic.
According to the user abnormal behavior identification method provided by the embodiment of the application, the user abnormal behavior is identified through the historical command record of the user, and the identification of the user abnormal behavior with variability and complexity can be realized. Meanwhile, in the embodiment of the application, the abnormal behavior detection of the user is realized by performing dimension reduction processing and clustering processing on the word frequency characteristics of the keyword command for characterizing the behavior characteristics of the user, and the identification speed is high and the identification accuracy is high.
Further, as shown in fig. 2, the method for identifying abnormal user behavior provided in this embodiment of the present application, in step 103, determining the word frequency characteristic of the keyword command according to the word frequency of the keyword command in the corresponding history command record and the inverse document word frequency of the keyword command in all history command records, includes:
s201, aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record.
In the embodiment of the present application, it is assumed that in one log-in (i.e. one historical command record), there are n keyword commands, where the keyword command x appears m times, and then the word frequency of the keyword command x is m, but generally in order to eliminate dimension, it is necessary to pass through a formulaThe word frequency of the keyword command x is normalized.
For example, W is { cd, cd, cp, vim, mkdir, cp, ssh }, where W represents a history command record, and cd, cd, cp, vim, mkdir, cp, ssh each represent a specific command. If the keyword command cd is present 2 times, the word frequency is 2/7.
S202, determining the inverse document word frequency of the keyword command according to the total number of the history command records and the number of the history command records including the keyword command.
In the embodiment of the application, the formula is usedCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
K represents all the registration times (i.e. the total number of the history command records is K), and K (x) represents the registration times including the keyword command x in all the registrations (i.e. the number of the history command records including the keyword command x is K (x)), then the IDF of the word x is calculated according to the formula (2). Here, in order to avoid the denominator being 0, 1 is added to the denominator (i.e., laplacian smoothing processing is performed).
Examples are:
W1={cd,cd,cp,vim,mkdir,cp,ssh};
W2={cd,touch,cp,vim,mkdir,cp,ssh};
W3={cd,top,cp,vim,mkdir,cp,ssh};
wherein, W1、W2And W3Indicating different historical command records, W1、W2And W3Respectively, includes a plurality of specific commands. Of the three log-in samples described above (i.e., three historical command records), "touch" is only logged in for the second time (i.e., W)2In (2), k (x) is 1 in the inverse document word frequency of "touch", and thus, the inverse document word frequency of "touch" appearsSimilarly, the inverse document word frequency IDF of "cd" is
S203, determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
As shown in table 1, table 2 and table 3, in the embodiment of the present application, after obtaining the normalized word frequency TF and the inverse document word frequency IDF, the TF-IDF value of a keyword command x is the product of the normalized word frequency TF and the inverse document word frequency IDF of the keyword command x, that is, TF-IDF (x) TF (x) IDF (x).
Table 1 raw data table
cd | cp | touch | ssh | mkdir | |
W1 | 2 | 1 | 0 | 0 | 1 |
W2 | 3 | 4 | 8 | 0 | 1 |
W3 | 2 | 3 | 0 | 1 | 7 |
W4 | 1 | 0 | 0 | 1 | 0 |
W5 | 5 | 7 | 1 | 2 | 0 |
TABLE 2TF tables
TABLE 3IDF Table
Further, as shown in fig. 3, in the method for identifying an abnormal user behavior provided in the embodiment of the present application, in step 104, determining an abnormal historical command record according to the word frequency feature of the keyword command includes:
s301, performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix.
As an embodiment, the specific dimension reduction process includes the following steps: normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix; and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in a hidden Semantic analysis (LAS) dimensionality reduction algorithm to obtain a comprehensive word frequency characteristic matrix.
In the embodiment of the application, TFIDF (word frequency-inverse document frequency) is used for replacing simple word frequency characteristics to distinguish the importance of each word frequency characteristic to a user behavior mode, and then the word frequency characteristics are subjected to normalization processing to ensure the unbiased property of data characteristics, so that a word frequency characteristic matrix is obtained.
After the word frequency feature matrix of the user behavior obtained through normalization processing is obtained (the row of the matrix is each sample, and the column is each feature), the feature vector corresponding to the largest K singular values is extracted by using an LSA dimension reduction technology, so that the original feature matrix is compressed into a K-column feature matrix, wherein K is used as a hyper-parameter and is used for the word frequency feature dimension of the comprehensive word frequency feature matrix after dimension reduction.
S302, clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
In the embodiment of the application, the DBSCAN algorithm is adopted to perform clustering processing on the comprehensive word frequency characteristic matrix, and the DBSCAN clustering algorithm has two important input parameters (namely, a scanning radius eps and a minimum contained point number minPts).
Further, as shown in fig. 4, in the method for identifying an abnormal user behavior provided in the embodiment of the present application, in step 302, the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record includes:
s401, based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients of the undetermined abnormity.
In the embodiment of the application, the preset input parameters in the DBSCAN algorithm are scanning radius eps and minimum contained point number minPts, and the comprehensive word frequency feature matrix is clustered according to the preset scanning radius eps value and minimum contained point number minPts value to obtain the historical command record and the contour coefficient of the undetermined abnormity.
S402, if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
S403, if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value.
The method for updating the original input parameter value can be grid parameter search, random parameter search, genetic algorithm parameter optimization and the like. In the embodiment of the application, the original input parameter value is updated by adopting a grid searching method.
S404, clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain historical command records and contour coefficients of the undetermined abnormity.
S405, if the contour coefficient is detected to be smaller than the first preset threshold value, returning to the step of updating the original input parameter value through a grid searching method until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
In the embodiment of the application, after an expected clustering effect is achieved, a data clustering label of each historical command record is obtained, wherein a class marked as-1 is an outlier in data, and the historical command record corresponding to the class marked as-1 is determined as an abnormal historical command record.
According to the user abnormal behavior identification method provided by the embodiment of the application, the user abnormal behavior is identified through the historical command record of the user, and the identification of the user abnormal behavior with variability and complexity can be realized. Meanwhile, in the embodiment of the application, the abnormal behavior detection of the user is realized by performing dimension reduction processing and clustering processing on the word frequency characteristics of the keyword command for characterizing the behavior characteristics of the user, and the identification speed is high and the identification accuracy is high.
The embodiment of the application is applied to the fields of user behavior analysis, text mining, prediction evaluation and the like, and in a prediction model of user behaviors, the accuracy of an unsupervised algorithm is greatly improved. And in addition, the method has very good practical application value in different prediction models.
A second embodiment of the present application further provides a device for identifying an abnormal user behavior, where the device is configured to execute the method for identifying an abnormal user behavior in the first embodiment, and as shown in fig. 5, the device includes:
an obtaining module 11, configured to obtain a plurality of different historical command records of users having the same behavior; wherein any of the historical command records comprises a plurality of commands;
the extracting module 12 is configured to extract a keyword command from the plurality of history command records according to the behavior feature corresponding to the user;
the determining module 13 is configured to determine a word frequency feature of the keyword command according to the word frequency of the keyword command in the corresponding history command record and the word frequency of the inverse document of the keyword command in all history command records;
the determining module 13 is further configured to determine an abnormal historical command record according to the word frequency feature of the keyword command.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix;
and clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix;
and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in an LAS dimensionality reduction algorithm for implicit semantic analysis to obtain a comprehensive word frequency characteristic matrix.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients to be determined to be abnormal;
and if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value;
clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain a historical command record and a contour coefficient of the undetermined abnormity;
if the contour coefficient is detected to be smaller than the first preset threshold value, the step of updating the original input parameter value through a grid searching method is returned until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
According to the user abnormal behavior recognition device provided by the embodiment of the application, the user abnormal behavior is recognized through the historical command record of the user, and the recognition of the user abnormal behavior with variability and complexity can be realized. In addition, in the embodiment of the application, the abnormal behavior of the user is detected by the keyword command representing the behavior characteristics of the user and the corresponding user operation frequency and by the dimension reduction processing and clustering method based on the keyword command and the corresponding user operation frequency, so that the identification speed is high and the identification accuracy is high.
Fig. 6 is a schematic structural diagram of a computer device 40 according to an embodiment of the present application, and as shown in fig. 6, a computer device 40 according to a third embodiment of the present application includes: a processor 402, a memory 401 and a bus, the memory 401 storing execution instructions, the processor 402 and the memory 401 communicating via the bus when the computer device 40 is running, the processor 402 executing the execution instructions to make the computer device 40 execute the user abnormal behavior recognition method.
Specifically, the memory 401 and the processor 402 can be general-purpose memories and processors, which are not limited to the specific embodiments, and the user abnormal behavior identification method can be executed when the processor 402 runs a computer program stored in the memory 401.
Corresponding to the above method for identifying abnormal user behavior, a computer storage medium provided in the fourth embodiment of the present application stores computer executable instructions, and the computer executable instructions can execute the method for identifying abnormal user behavior in the first embodiment of the present application.
The user abnormal behavior recognition device provided by the embodiment of the application can be specific hardware on the device or software or firmware installed on the device. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A user abnormal behavior identification method is applied to a server and comprises the following steps:
acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
extracting a plurality of keyword commands in the historical command records according to the behavior characteristics corresponding to the user;
determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all historical command records;
and determining an abnormal historical command record according to the word frequency characteristics of the keyword command.
2. The method for identifying abnormal behaviors of users according to claim 1, wherein the step of determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the history command record and the word frequency of the inverse document of the keyword command in all history command records comprises the steps of:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
3. The method for identifying abnormal behaviors of users according to claim 2, wherein the determining the word frequency of the inverse document of the keyword command according to the total number of the history command records and the number of the history command records including the keyword command comprises:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
4. The method for identifying abnormal behaviors of users according to claim 2, wherein the determining an abnormal historical command record according to the word frequency characteristics of the keyword command comprises:
performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix;
and clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
5. The method according to claim 4, wherein the performing a dimension reduction process on the word frequency feature of the keyword command to obtain a comprehensive word frequency feature matrix comprises:
normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix;
and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in an LAS dimensionality reduction algorithm for implicit semantic analysis to obtain a comprehensive word frequency characteristic matrix.
6. The method according to claim 4, wherein the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record comprises:
based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients to be determined to be abnormal;
and if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
7. The method according to claim 6, wherein the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record further comprises:
if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value;
clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain a historical command record and a contour coefficient of the undetermined abnormity;
if the contour coefficient is detected to be smaller than the first preset threshold value, the step of updating the original input parameter value through a grid searching method is returned until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
8. An apparatus for recognizing abnormal user behavior, the apparatus comprising:
the acquisition module is used for acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
the extraction module is used for extracting the keyword commands in the plurality of historical command records according to the behavior characteristics corresponding to the user;
the determining module is used for determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all the historical command records;
the determining module is further configured to determine an abnormal historical command record according to the word frequency feature of the keyword command.
9. The apparatus for recognizing abnormal user behavior according to claim 8, wherein the determining module is specifically configured to:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
10. The apparatus for recognizing abnormal user behavior according to claim 9, wherein the determining module is specifically configured to:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811386060.6A CN109495479B (en) | 2018-11-20 | 2018-11-20 | User abnormal behavior identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811386060.6A CN109495479B (en) | 2018-11-20 | 2018-11-20 | User abnormal behavior identification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109495479A true CN109495479A (en) | 2019-03-19 |
CN109495479B CN109495479B (en) | 2021-12-24 |
Family
ID=65697127
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811386060.6A Active CN109495479B (en) | 2018-11-20 | 2018-11-20 | User abnormal behavior identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109495479B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110075524A (en) * | 2019-05-10 | 2019-08-02 | 腾讯科技(深圳)有限公司 | Anomaly detection method and device |
CN110493221A (en) * | 2019-08-19 | 2019-11-22 | 四川大学 | A kind of network anomaly detection method based on the profile that clusters |
CN110866114A (en) * | 2019-10-16 | 2020-03-06 | 平安科技(深圳)有限公司 | Object behavior identification method and device and terminal equipment |
WO2020211251A1 (en) * | 2019-04-16 | 2020-10-22 | 平安科技(深圳)有限公司 | Monitoring method and apparatus for operating system |
CN111857097A (en) * | 2020-07-27 | 2020-10-30 | 中国南方电网有限责任公司超高压输电公司昆明局 | Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency |
CN112288561A (en) * | 2020-05-25 | 2021-01-29 | 百维金科(上海)信息科技有限公司 | Internet financial fraud behavior detection method based on DBSCAN algorithm |
CN113761133A (en) * | 2021-09-10 | 2021-12-07 | 未鲲(上海)科技服务有限公司 | System abnormity monitoring method and device based on artificial intelligence and related equipment |
CN115442156A (en) * | 2022-11-03 | 2022-12-06 | 联通(广东)产业互联网有限公司 | User terminal use condition identification method, system, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7454790B2 (en) * | 2005-05-23 | 2008-11-18 | Ut-Battelle, Llc | Method for detecting sophisticated cyber attacks |
CN105871630A (en) * | 2016-05-30 | 2016-08-17 | 国家计算机网络与信息安全管理中心 | Method for determining Internet surfing behavior categories of network users |
CN107426199A (en) * | 2017-07-05 | 2017-12-01 | 浙江鹏信信息科技股份有限公司 | A kind of method and system of Network anomalous behaviors detection and analysis |
CN107579956A (en) * | 2017-08-07 | 2018-01-12 | 北京奇安信科技有限公司 | The detection method and device of a kind of user behavior |
CN108427669A (en) * | 2018-02-27 | 2018-08-21 | 华青融天(北京)技术股份有限公司 | Abnormal behaviour monitoring method and system |
CN108632097A (en) * | 2018-05-14 | 2018-10-09 | 平安科技(深圳)有限公司 | Recognition methods, terminal device and the medium of abnormal behaviour object |
-
2018
- 2018-11-20 CN CN201811386060.6A patent/CN109495479B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7454790B2 (en) * | 2005-05-23 | 2008-11-18 | Ut-Battelle, Llc | Method for detecting sophisticated cyber attacks |
CN105871630A (en) * | 2016-05-30 | 2016-08-17 | 国家计算机网络与信息安全管理中心 | Method for determining Internet surfing behavior categories of network users |
CN107426199A (en) * | 2017-07-05 | 2017-12-01 | 浙江鹏信信息科技股份有限公司 | A kind of method and system of Network anomalous behaviors detection and analysis |
CN107579956A (en) * | 2017-08-07 | 2018-01-12 | 北京奇安信科技有限公司 | The detection method and device of a kind of user behavior |
CN108427669A (en) * | 2018-02-27 | 2018-08-21 | 华青融天(北京)技术股份有限公司 | Abnormal behaviour monitoring method and system |
CN108632097A (en) * | 2018-05-14 | 2018-10-09 | 平安科技(深圳)有限公司 | Recognition methods, terminal device and the medium of abnormal behaviour object |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020211251A1 (en) * | 2019-04-16 | 2020-10-22 | 平安科技(深圳)有限公司 | Monitoring method and apparatus for operating system |
CN110075524A (en) * | 2019-05-10 | 2019-08-02 | 腾讯科技(深圳)有限公司 | Anomaly detection method and device |
CN110493221A (en) * | 2019-08-19 | 2019-11-22 | 四川大学 | A kind of network anomaly detection method based on the profile that clusters |
CN110493221B (en) * | 2019-08-19 | 2020-04-28 | 四川大学 | Network anomaly detection method based on clustering contour |
CN110866114A (en) * | 2019-10-16 | 2020-03-06 | 平安科技(深圳)有限公司 | Object behavior identification method and device and terminal equipment |
CN110866114B (en) * | 2019-10-16 | 2023-05-26 | 平安科技(深圳)有限公司 | Object behavior identification method and device and terminal equipment |
CN112288561A (en) * | 2020-05-25 | 2021-01-29 | 百维金科(上海)信息科技有限公司 | Internet financial fraud behavior detection method based on DBSCAN algorithm |
CN111857097A (en) * | 2020-07-27 | 2020-10-30 | 中国南方电网有限责任公司超高压输电公司昆明局 | Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency |
CN111857097B (en) * | 2020-07-27 | 2023-10-31 | 中国南方电网有限责任公司超高压输电公司昆明局 | Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency |
CN113761133A (en) * | 2021-09-10 | 2021-12-07 | 未鲲(上海)科技服务有限公司 | System abnormity monitoring method and device based on artificial intelligence and related equipment |
CN115442156A (en) * | 2022-11-03 | 2022-12-06 | 联通(广东)产业互联网有限公司 | User terminal use condition identification method, system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109495479B (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109495479B (en) | User abnormal behavior identification method and device | |
CN110020422B (en) | Feature word determining method and device and server | |
Magu et al. | Determining code words in euphemistic hate speech using word embedding networks | |
US20200081899A1 (en) | Automated database schema matching | |
WO2022051663A1 (en) | Domain name processing systems and methods | |
US10452627B2 (en) | Column weight calculation for data deduplication | |
US20210250327A1 (en) | Domain name processing systems and methods | |
CN108269122B (en) | Advertisement similarity processing method and device | |
CN110929525B (en) | Network loan risk behavior analysis and detection method, device, equipment and storage medium | |
JPWO2018159337A1 (en) | Profile generation device, attack detection device, profile generation method, and profile generation program | |
CN112395881B (en) | Material label construction method and device, readable storage medium and electronic equipment | |
WO2019061664A1 (en) | Electronic device, user's internet surfing data-based product recommendation method, and storage medium | |
WO2016130374A1 (en) | Method and apparatus for assigning device fingerprints to internet devices | |
CN111090807A (en) | Knowledge graph-based user identification method and device | |
CN108763961B (en) | Big data based privacy data grading method and device | |
CN110570199A (en) | User identity detection method and system based on user input behaviors | |
CN105164676A (en) | Query features and questions | |
CN113807073B (en) | Text content anomaly detection method, device and storage medium | |
Karkali et al. | Using temporal IDF for efficient novelty detection in text streams | |
CN110457707B (en) | Method and device for extracting real word keywords, electronic equipment and readable storage medium | |
CN113688240A (en) | Threat element extraction method, device, equipment and storage medium | |
CN111988327B (en) | Threat behavior detection and model establishment method and device, electronic equipment and storage medium | |
CN111427883A (en) | Data processing method and device based on AeroPike, computer equipment and storage medium | |
WO2019235074A1 (en) | Generation method, generation device, and generation program | |
CN109242690A (en) | Finance product recommended method, device, computer equipment and readable storage medium storing program for executing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |