CN109495479A - A kind of user's abnormal behaviour recognition methods and device - Google Patents

A kind of user's abnormal behaviour recognition methods and device Download PDF

Info

Publication number
CN109495479A
CN109495479A CN201811386060.6A CN201811386060A CN109495479A CN 109495479 A CN109495479 A CN 109495479A CN 201811386060 A CN201811386060 A CN 201811386060A CN 109495479 A CN109495479 A CN 109495479A
Authority
CN
China
Prior art keywords
command
word frequency
keyword
historical
records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811386060.6A
Other languages
Chinese (zh)
Other versions
CN109495479B (en
Inventor
张佳
苏禹磨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hua Qing Rong Tian (beijing) Software Ltd By Share Ltd
Original Assignee
Hua Qing Rong Tian (beijing) Software Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hua Qing Rong Tian (beijing) Software Ltd By Share Ltd filed Critical Hua Qing Rong Tian (beijing) Software Ltd By Share Ltd
Priority to CN201811386060.6A priority Critical patent/CN109495479B/en
Publication of CN109495479A publication Critical patent/CN109495479A/en
Application granted granted Critical
Publication of CN109495479B publication Critical patent/CN109495479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

This application provides a kind of user's abnormal behaviour recognition methods and devices, are related to computer security technical field, which comprises obtain multiple and different history command records of the user with identical behavior;Any history command record includes multiple orders;According to the corresponding behavioural characteristic of user, the keyword order in multiple history command records is extracted;According to keyword order in the inverse document word frequency of word frequency and the keyword order in all history commands record in affiliated history command record, the words-frequency feature of keyword order is determined;According to the words-frequency feature of keyword order, exception history command record is determined.Keyword order by characterizing user behavior characteristics in the history command record of user carries out the identification of user's abnormal behaviour, can be realized and identifies to user's abnormal behaviour with variability and complexity.Also, by the way of carrying out dimensionality reduction and clustering processing to the words-frequency feature of keyword order, recognition speed is fast and recognition accuracy is higher.

Description

User abnormal behavior identification method and device
Technical Field
The application relates to the technical field of computer security, in particular to a method and a device for identifying abnormal behaviors of a user.
Background
In an actual computer network system, there are generally a plurality of legitimate users, and legitimate users can perform business operations in the computer network system based on their own operation authorities. In general, it is necessary to monitor the behaviors of legitimate users in a computer network system and detect whether there is an abnormal behavior, so as to prevent other users from falsely using the accounts of the legitimate users to perform illegal operations.
At present, a method for detecting whether there is an abnormal behavior in a computer network system is as follows: usually, a rule table containing many illegal and sensitive words is first formulated, and abnormal behaviors in user behaviors are identified by matching the user behaviors with rules of the rule table. However, user behavior is diverse and complex, i.e., user behavior may vary with work content, user interests, work hours, and other uncertainty factors. Therefore, the traditional method based on the hard rule table cannot accurately monitor the abnormal behavior of the user and cannot meet the actual requirement.
Disclosure of Invention
In view of this, an object of the embodiments of the present application is to provide a method and an apparatus for identifying an abnormal user behavior, which can identify an abnormal user behavior with variability and complexity, and have high identification accuracy.
In a first aspect, an embodiment of the present application provides a method for identifying an abnormal behavior of a user, where the method is applied to a server, and the method includes:
acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
extracting a plurality of keyword commands in the historical command records according to the behavior characteristics corresponding to the user;
determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all historical command records;
and determining an abnormal historical command record according to the word frequency characteristics of the keyword command.
With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where the determining, according to the word frequency of the keyword command in the corresponding history command record and the word frequency of the inverse document of the keyword command in all history command records, the word frequency feature of the keyword command includes:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
With reference to the first possible implementation manner of the first aspect, this application provides a second possible implementation manner of the first aspect, where the determining, according to the total number of the history command records and the number of the history command records including the keyword command, an inverse document word frequency of the keyword command includes:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where the determining an abnormal historical command record according to the word frequency feature of the keyword command includes:
performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix;
and clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where the performing dimension reduction processing on the word frequency features of the keyword command to obtain a comprehensive word frequency feature matrix includes:
normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix;
and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in an LAS dimensionality reduction algorithm for implicit semantic analysis to obtain a comprehensive word frequency characteristic matrix.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present application provides a fifth possible implementation manner of the first aspect, where the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record includes:
based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients to be determined to be abnormal;
and if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
In combination with the fifth possible implementation manner of the first aspect, the present application provides a sixth possible implementation manner of the first aspect, wherein,
the clustering processing is performed on the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record, and the method further comprises the following steps:
if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value;
clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain a historical command record and a contour coefficient of the undetermined abnormity;
if the contour coefficient is detected to be smaller than the first preset threshold value, the step of updating the original input parameter value through a grid searching method is returned until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
In a second aspect, an embodiment of the present application further provides an apparatus for identifying an abnormal behavior of a user, where the apparatus includes:
the acquisition module is used for acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
the extraction module is used for extracting the keyword commands in the plurality of historical command records according to the behavior characteristics corresponding to the user;
the determining module is used for determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all the historical command records;
the determining module is further configured to determine an abnormal historical command record according to the word frequency feature of the keyword command.
With reference to the second aspect, an embodiment of the present application provides a first possible implementation manner of the second aspect, where the determining module is specifically configured to:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
With reference to the first possible implementation manner of the second aspect, an embodiment of the present application provides a second possible implementation manner of the second aspect, where the determining module is specifically configured to:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein,
IDF (x) represents the word frequency of the inverse document, K represents the total number of the historical command records, and K (x) represents the number of the historical command records containing the keyword command in all the historical command records.
According to the method and the device for identifying the abnormal user behaviors, the abnormal user behaviors are identified through the historical command records of the user, and the abnormal user behaviors with variability and complexity can be identified. Meanwhile, in the embodiment of the application, the abnormal behavior detection of the user is realized by performing dimension reduction processing and clustering processing on the word frequency characteristics of the keyword command for characterizing the behavior characteristics of the user, and the identification speed is high and the identification accuracy is high.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 shows a flowchart of a method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 2 shows a flowchart of another method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 3 shows a flowchart of another method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 4 shows a flowchart of another method for identifying abnormal user behavior according to an embodiment of the present application.
Fig. 5 shows a schematic structural diagram of an apparatus for identifying an abnormal behavior of a user according to an embodiment of the present application.
Fig. 6 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
In consideration of the fact that the conventional hard rule table-based method cannot accurately monitor the abnormal behavior of the user, the embodiment of the application provides a method and a device for identifying the abnormal behavior of the user, which are described below by an embodiment.
As shown in fig. 1, a first embodiment of the present application provides a method for identifying an abnormal user behavior, which is applied to a server, and the method includes:
s101, acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands.
In the embodiment of the application, when a user logs in, the server can detect whether the user currently logged in is legal or not, and if so, the legal user is allowed to log in; if the user is illegal, the illegal user is refused to log in. As an optional implementation, the method for the server to detect whether the currently logged-in user is legal is as follows: the server detects whether the user information of the user is in a preset white list or not, and if so, the user is determined to be legal. Here, the white list stores user information of a legitimate user.
As an optional implementation manner, the server is a Linux server, on a Linux platform, Shell is the most important communication interface between a terminal user and an operating system, and a large proportion of user activities are completed by Shell commands. Thus, the Shell command can directly reflect the behavior characteristics of the user.
In general, the user behaviors corresponding to different services are different. For the same service, the behavior characteristics corresponding to different operation authorities and operation habits of the user are different. Therefore, in the embodiment of the application, historical command records of users operating the same service or the same department or the same user at different times are used. Wherein, the users have the same operation behavior.
Here, the above-mentioned history command record includes a plurality of commands and the same command may be repeated. The command is a Shell command, and abnormal operation in user behavior is identified based on the Shell command of the user in the embodiment of the application. The Shell command in the historical command record and the word frequency characteristics corresponding to the Shell command can represent the behavior characteristics of the user.
And S102, extracting a plurality of keyword commands in the historical command records according to the behavior characteristics corresponding to the user.
In the embodiment of the application, all the shell history command records of the user logging in the Linux server each time are used as sample data, and the shell history command records are text commands, so word segmentation processing is firstly carried out on the shell history command records. Here, the word segmentation processing refers to extracting a keyword command from a text command.
As a specific implementation manner, for each user with the same behavior, a keyword command corresponding to the behavior feature of the user is extracted from a history command record corresponding to the user according to the behavior features such as the service information, the operation authority, the operation habit and the like operated by the user.
S103, determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all the historical command records.
Here, the Term Frequency characteristic of the keyword command is Term Frequency-inverse document Frequency (TF-IDF). Wherein, TF refers to word frequency, which indicates how frequently a keyword command appears in one log-in (i.e., one history command record). If a keyword command frequently appears in the log-in (and also the associated historical command record), then the business patterns that can represent the log-in are compared (e.g., the word "vim" frequently appears in a log-in, indicating that it is likely that the file will be modified).
IDF refers to inverse document word frequency, which represents how infrequent keyword commands occur in all logins, if a keyword command occurs frequently in each login, such as the words "cd", "cp", etc., these words are relatively unimportant; if a keyword command does not occur frequently in each entry, such as the words "ssh", "vim", etc., these words are relatively more important, i.e., they can better distinguish between different services in each entry.
In the embodiment of the application, TF and IDF are respectively determined according to the user operation frequency corresponding to the keyword command, and then word frequency characteristics (namely TF-IDF) are determined according to TF and IDF.
And S104, determining an abnormal historical command record according to the word frequency characteristics of the keyword command.
In the embodiment of the application, the keyword command corresponding to each historical command record and the word frequency characteristic of each keyword command are combined into a word frequency characteristic matrix, and then the word frequency characteristic matrix is clustered to obtain abnormal historical command records. Since the user information is also included in the abnormal historical command record, the abnormal behavior of the user is recognized by the abnormal historical command record mode.
Here, the abnormal behavior of the user may refer to an attack behavior impersonated as a legitimate user, or may refer to a corresponding normal behavior after the legitimate user changes an operation habit or changes a corresponding service characteristic.
According to the user abnormal behavior identification method provided by the embodiment of the application, the user abnormal behavior is identified through the historical command record of the user, and the identification of the user abnormal behavior with variability and complexity can be realized. Meanwhile, in the embodiment of the application, the abnormal behavior detection of the user is realized by performing dimension reduction processing and clustering processing on the word frequency characteristics of the keyword command for characterizing the behavior characteristics of the user, and the identification speed is high and the identification accuracy is high.
Further, as shown in fig. 2, the method for identifying abnormal user behavior provided in this embodiment of the present application, in step 103, determining the word frequency characteristic of the keyword command according to the word frequency of the keyword command in the corresponding history command record and the inverse document word frequency of the keyword command in all history command records, includes:
s201, aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record.
In the embodiment of the present application, it is assumed that in one log-in (i.e. one historical command record), there are n keyword commands, where the keyword command x appears m times, and then the word frequency of the keyword command x is m, but generally in order to eliminate dimension, it is necessary to pass through a formulaThe word frequency of the keyword command x is normalized.
For example, W is { cd, cd, cp, vim, mkdir, cp, ssh }, where W represents a history command record, and cd, cd, cp, vim, mkdir, cp, ssh each represent a specific command. If the keyword command cd is present 2 times, the word frequency is 2/7.
S202, determining the inverse document word frequency of the keyword command according to the total number of the history command records and the number of the history command records including the keyword command.
In the embodiment of the application, the formula is usedCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
K represents all the registration times (i.e. the total number of the history command records is K), and K (x) represents the registration times including the keyword command x in all the registrations (i.e. the number of the history command records including the keyword command x is K (x)), then the IDF of the word x is calculated according to the formula (2). Here, in order to avoid the denominator being 0, 1 is added to the denominator (i.e., laplacian smoothing processing is performed).
Examples are:
W1={cd,cd,cp,vim,mkdir,cp,ssh};
W2={cd,touch,cp,vim,mkdir,cp,ssh};
W3={cd,top,cp,vim,mkdir,cp,ssh};
wherein, W1、W2And W3Indicating different historical command records, W1、W2And W3Respectively, includes a plurality of specific commands. Of the three log-in samples described above (i.e., three historical command records), "touch" is only logged in for the second time (i.e., W)2In (2), k (x) is 1 in the inverse document word frequency of "touch", and thus, the inverse document word frequency of "touch" appearsSimilarly, the inverse document word frequency IDF of "cd" is
S203, determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
As shown in table 1, table 2 and table 3, in the embodiment of the present application, after obtaining the normalized word frequency TF and the inverse document word frequency IDF, the TF-IDF value of a keyword command x is the product of the normalized word frequency TF and the inverse document word frequency IDF of the keyword command x, that is, TF-IDF (x) TF (x) IDF (x).
Table 1 raw data table
cd cp touch ssh mkdir
W1 2 1 0 0 1
W2 3 4 8 0 1
W3 2 3 0 1 7
W4 1 0 0 1 0
W5 5 7 1 2 0
TABLE 2TF tables
TABLE 3IDF Table
Further, as shown in fig. 3, in the method for identifying an abnormal user behavior provided in the embodiment of the present application, in step 104, determining an abnormal historical command record according to the word frequency feature of the keyword command includes:
s301, performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix.
As an embodiment, the specific dimension reduction process includes the following steps: normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix; and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in a hidden Semantic analysis (LAS) dimensionality reduction algorithm to obtain a comprehensive word frequency characteristic matrix.
In the embodiment of the application, TFIDF (word frequency-inverse document frequency) is used for replacing simple word frequency characteristics to distinguish the importance of each word frequency characteristic to a user behavior mode, and then the word frequency characteristics are subjected to normalization processing to ensure the unbiased property of data characteristics, so that a word frequency characteristic matrix is obtained.
After the word frequency feature matrix of the user behavior obtained through normalization processing is obtained (the row of the matrix is each sample, and the column is each feature), the feature vector corresponding to the largest K singular values is extracted by using an LSA dimension reduction technology, so that the original feature matrix is compressed into a K-column feature matrix, wherein K is used as a hyper-parameter and is used for the word frequency feature dimension of the comprehensive word frequency feature matrix after dimension reduction.
S302, clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
In the embodiment of the application, the DBSCAN algorithm is adopted to perform clustering processing on the comprehensive word frequency characteristic matrix, and the DBSCAN clustering algorithm has two important input parameters (namely, a scanning radius eps and a minimum contained point number minPts).
Further, as shown in fig. 4, in the method for identifying an abnormal user behavior provided in the embodiment of the present application, in step 302, the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record includes:
s401, based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients of the undetermined abnormity.
In the embodiment of the application, the preset input parameters in the DBSCAN algorithm are scanning radius eps and minimum contained point number minPts, and the comprehensive word frequency feature matrix is clustered according to the preset scanning radius eps value and minimum contained point number minPts value to obtain the historical command record and the contour coefficient of the undetermined abnormity.
S402, if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
S403, if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value.
The method for updating the original input parameter value can be grid parameter search, random parameter search, genetic algorithm parameter optimization and the like. In the embodiment of the application, the original input parameter value is updated by adopting a grid searching method.
S404, clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain historical command records and contour coefficients of the undetermined abnormity.
S405, if the contour coefficient is detected to be smaller than the first preset threshold value, returning to the step of updating the original input parameter value through a grid searching method until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
In the embodiment of the application, after an expected clustering effect is achieved, a data clustering label of each historical command record is obtained, wherein a class marked as-1 is an outlier in data, and the historical command record corresponding to the class marked as-1 is determined as an abnormal historical command record.
According to the user abnormal behavior identification method provided by the embodiment of the application, the user abnormal behavior is identified through the historical command record of the user, and the identification of the user abnormal behavior with variability and complexity can be realized. Meanwhile, in the embodiment of the application, the abnormal behavior detection of the user is realized by performing dimension reduction processing and clustering processing on the word frequency characteristics of the keyword command for characterizing the behavior characteristics of the user, and the identification speed is high and the identification accuracy is high.
The embodiment of the application is applied to the fields of user behavior analysis, text mining, prediction evaluation and the like, and in a prediction model of user behaviors, the accuracy of an unsupervised algorithm is greatly improved. And in addition, the method has very good practical application value in different prediction models.
A second embodiment of the present application further provides a device for identifying an abnormal user behavior, where the device is configured to execute the method for identifying an abnormal user behavior in the first embodiment, and as shown in fig. 5, the device includes:
an obtaining module 11, configured to obtain a plurality of different historical command records of users having the same behavior; wherein any of the historical command records comprises a plurality of commands;
the extracting module 12 is configured to extract a keyword command from the plurality of history command records according to the behavior feature corresponding to the user;
the determining module 13 is configured to determine a word frequency feature of the keyword command according to the word frequency of the keyword command in the corresponding history command record and the word frequency of the inverse document of the keyword command in all history command records;
the determining module 13 is further configured to determine an abnormal historical command record according to the word frequency feature of the keyword command.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix;
and clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix;
and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in an LAS dimensionality reduction algorithm for implicit semantic analysis to obtain a comprehensive word frequency characteristic matrix.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients to be determined to be abnormal;
and if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
Further, as shown in fig. 5, in the device for identifying an abnormal user behavior provided in the embodiment of the present application, the determining module 13 is specifically configured to:
if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value;
clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain a historical command record and a contour coefficient of the undetermined abnormity;
if the contour coefficient is detected to be smaller than the first preset threshold value, the step of updating the original input parameter value through a grid searching method is returned until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
According to the user abnormal behavior recognition device provided by the embodiment of the application, the user abnormal behavior is recognized through the historical command record of the user, and the recognition of the user abnormal behavior with variability and complexity can be realized. In addition, in the embodiment of the application, the abnormal behavior of the user is detected by the keyword command representing the behavior characteristics of the user and the corresponding user operation frequency and by the dimension reduction processing and clustering method based on the keyword command and the corresponding user operation frequency, so that the identification speed is high and the identification accuracy is high.
Fig. 6 is a schematic structural diagram of a computer device 40 according to an embodiment of the present application, and as shown in fig. 6, a computer device 40 according to a third embodiment of the present application includes: a processor 402, a memory 401 and a bus, the memory 401 storing execution instructions, the processor 402 and the memory 401 communicating via the bus when the computer device 40 is running, the processor 402 executing the execution instructions to make the computer device 40 execute the user abnormal behavior recognition method.
Specifically, the memory 401 and the processor 402 can be general-purpose memories and processors, which are not limited to the specific embodiments, and the user abnormal behavior identification method can be executed when the processor 402 runs a computer program stored in the memory 401.
Corresponding to the above method for identifying abnormal user behavior, a computer storage medium provided in the fourth embodiment of the present application stores computer executable instructions, and the computer executable instructions can execute the method for identifying abnormal user behavior in the first embodiment of the present application.
The user abnormal behavior recognition device provided by the embodiment of the application can be specific hardware on the device or software or firmware installed on the device. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A user abnormal behavior identification method is applied to a server and comprises the following steps:
acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
extracting a plurality of keyword commands in the historical command records according to the behavior characteristics corresponding to the user;
determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all historical command records;
and determining an abnormal historical command record according to the word frequency characteristics of the keyword command.
2. The method for identifying abnormal behaviors of users according to claim 1, wherein the step of determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the history command record and the word frequency of the inverse document of the keyword command in all history command records comprises the steps of:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
3. The method for identifying abnormal behaviors of users according to claim 2, wherein the determining the word frequency of the inverse document of the keyword command according to the total number of the history command records and the number of the history command records including the keyword command comprises:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
4. The method for identifying abnormal behaviors of users according to claim 2, wherein the determining an abnormal historical command record according to the word frequency characteristics of the keyword command comprises:
performing dimensionality reduction processing on the word frequency characteristics of the keyword command to obtain a comprehensive word frequency characteristic matrix;
and clustering the comprehensive word frequency characteristic matrix to obtain an abnormal historical command record.
5. The method according to claim 4, wherein the performing a dimension reduction process on the word frequency feature of the keyword command to obtain a comprehensive word frequency feature matrix comprises:
normalizing the word frequency characteristics of the keyword command to obtain a word frequency characteristic matrix;
and performing dimensionality reduction processing on the word frequency characteristic matrix based on a preset super parameter value in an LAS dimensionality reduction algorithm for implicit semantic analysis to obtain a comprehensive word frequency characteristic matrix.
6. The method according to claim 4, wherein the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record comprises:
based on original input parameter values in the DBSCAN algorithm, clustering the comprehensive word frequency characteristic matrix to obtain historical command records and contour coefficients to be determined to be abnormal;
and if the contour coefficient is detected to be larger than a first preset threshold value, determining that the historical command record to be determined to be abnormal is an abnormal historical command record.
7. The method according to claim 6, wherein the clustering the comprehensive word frequency feature matrix to obtain an abnormal historical command record further comprises:
if the contour coefficient is detected to be smaller than the first preset threshold value, updating the original input parameter value through a grid searching method to obtain an updated input parameter value;
clustering the comprehensive word frequency characteristic matrix based on the updated input parameter values in the DBSCAN algorithm to obtain a historical command record and a contour coefficient of the undetermined abnormity;
if the contour coefficient is detected to be smaller than the first preset threshold value, the step of updating the original input parameter value through a grid searching method is returned until the obtained contour coefficient is larger than the first preset threshold value or the updating times reach a second preset threshold value.
8. An apparatus for recognizing abnormal user behavior, the apparatus comprising:
the acquisition module is used for acquiring a plurality of different historical command records of users with the same behavior; wherein any of the historical command records comprises a plurality of commands;
the extraction module is used for extracting the keyword commands in the plurality of historical command records according to the behavior characteristics corresponding to the user;
the determining module is used for determining the word frequency characteristics of the keyword command according to the word frequency of the keyword command in the historical command record and the inverse document word frequency of the keyword command in all the historical command records;
the determining module is further configured to determine an abnormal historical command record according to the word frequency feature of the keyword command.
9. The apparatus for recognizing abnormal user behavior according to claim 8, wherein the determining module is specifically configured to:
aiming at any keyword command, determining the standardized word frequency of the keyword command according to the word frequency of the keyword command in the historical command record and the number of the keyword commands in the historical command record;
determining the inverse document word frequency of the keyword command according to the total number of the historical command records and the number of the historical command records including the keyword command;
and determining the word frequency characteristics of the keyword command according to the standardized word frequency and the inverse document word frequency of the keyword command.
10. The apparatus for recognizing abnormal user behavior according to claim 9, wherein the determining module is specifically configured to:
according to the formulaCalculating the inverse document word frequency of the keyword command; wherein, idf (x) represents the word frequency of the inverse document, K represents the total number of the history command records, and K (x) represents the number of the history command records including the keyword command in all the history command records.
CN201811386060.6A 2018-11-20 2018-11-20 User abnormal behavior identification method and device Active CN109495479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811386060.6A CN109495479B (en) 2018-11-20 2018-11-20 User abnormal behavior identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811386060.6A CN109495479B (en) 2018-11-20 2018-11-20 User abnormal behavior identification method and device

Publications (2)

Publication Number Publication Date
CN109495479A true CN109495479A (en) 2019-03-19
CN109495479B CN109495479B (en) 2021-12-24

Family

ID=65697127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811386060.6A Active CN109495479B (en) 2018-11-20 2018-11-20 User abnormal behavior identification method and device

Country Status (1)

Country Link
CN (1) CN109495479B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110075524A (en) * 2019-05-10 2019-08-02 腾讯科技(深圳)有限公司 Anomaly detection method and device
CN110493221A (en) * 2019-08-19 2019-11-22 四川大学 A kind of network anomaly detection method based on the profile that clusters
CN110866114A (en) * 2019-10-16 2020-03-06 平安科技(深圳)有限公司 Object behavior identification method and device and terminal equipment
WO2020211251A1 (en) * 2019-04-16 2020-10-22 平安科技(深圳)有限公司 Monitoring method and apparatus for operating system
CN111857097A (en) * 2020-07-27 2020-10-30 中国南方电网有限责任公司超高压输电公司昆明局 Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency
CN112288561A (en) * 2020-05-25 2021-01-29 百维金科(上海)信息科技有限公司 Internet financial fraud behavior detection method based on DBSCAN algorithm
CN113761133A (en) * 2021-09-10 2021-12-07 未鲲(上海)科技服务有限公司 System abnormity monitoring method and device based on artificial intelligence and related equipment
CN115442156A (en) * 2022-11-03 2022-12-06 联通(广东)产业互联网有限公司 User terminal use condition identification method, system, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7454790B2 (en) * 2005-05-23 2008-11-18 Ut-Battelle, Llc Method for detecting sophisticated cyber attacks
CN105871630A (en) * 2016-05-30 2016-08-17 国家计算机网络与信息安全管理中心 Method for determining Internet surfing behavior categories of network users
CN107426199A (en) * 2017-07-05 2017-12-01 浙江鹏信信息科技股份有限公司 A kind of method and system of Network anomalous behaviors detection and analysis
CN107579956A (en) * 2017-08-07 2018-01-12 北京奇安信科技有限公司 The detection method and device of a kind of user behavior
CN108427669A (en) * 2018-02-27 2018-08-21 华青融天(北京)技术股份有限公司 Abnormal behaviour monitoring method and system
CN108632097A (en) * 2018-05-14 2018-10-09 平安科技(深圳)有限公司 Recognition methods, terminal device and the medium of abnormal behaviour object

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7454790B2 (en) * 2005-05-23 2008-11-18 Ut-Battelle, Llc Method for detecting sophisticated cyber attacks
CN105871630A (en) * 2016-05-30 2016-08-17 国家计算机网络与信息安全管理中心 Method for determining Internet surfing behavior categories of network users
CN107426199A (en) * 2017-07-05 2017-12-01 浙江鹏信信息科技股份有限公司 A kind of method and system of Network anomalous behaviors detection and analysis
CN107579956A (en) * 2017-08-07 2018-01-12 北京奇安信科技有限公司 The detection method and device of a kind of user behavior
CN108427669A (en) * 2018-02-27 2018-08-21 华青融天(北京)技术股份有限公司 Abnormal behaviour monitoring method and system
CN108632097A (en) * 2018-05-14 2018-10-09 平安科技(深圳)有限公司 Recognition methods, terminal device and the medium of abnormal behaviour object

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020211251A1 (en) * 2019-04-16 2020-10-22 平安科技(深圳)有限公司 Monitoring method and apparatus for operating system
CN110075524A (en) * 2019-05-10 2019-08-02 腾讯科技(深圳)有限公司 Anomaly detection method and device
CN110493221A (en) * 2019-08-19 2019-11-22 四川大学 A kind of network anomaly detection method based on the profile that clusters
CN110493221B (en) * 2019-08-19 2020-04-28 四川大学 Network anomaly detection method based on clustering contour
CN110866114A (en) * 2019-10-16 2020-03-06 平安科技(深圳)有限公司 Object behavior identification method and device and terminal equipment
CN110866114B (en) * 2019-10-16 2023-05-26 平安科技(深圳)有限公司 Object behavior identification method and device and terminal equipment
CN112288561A (en) * 2020-05-25 2021-01-29 百维金科(上海)信息科技有限公司 Internet financial fraud behavior detection method based on DBSCAN algorithm
CN111857097A (en) * 2020-07-27 2020-10-30 中国南方电网有限责任公司超高压输电公司昆明局 Industrial control system abnormity diagnosis information identification method based on word frequency and inverse document frequency
CN111857097B (en) * 2020-07-27 2023-10-31 中国南方电网有限责任公司超高压输电公司昆明局 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency
CN113761133A (en) * 2021-09-10 2021-12-07 未鲲(上海)科技服务有限公司 System abnormity monitoring method and device based on artificial intelligence and related equipment
CN115442156A (en) * 2022-11-03 2022-12-06 联通(广东)产业互联网有限公司 User terminal use condition identification method, system, device and storage medium

Also Published As

Publication number Publication date
CN109495479B (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN109495479B (en) User abnormal behavior identification method and device
CN110020422B (en) Feature word determining method and device and server
Magu et al. Determining code words in euphemistic hate speech using word embedding networks
US20200081899A1 (en) Automated database schema matching
WO2022051663A1 (en) Domain name processing systems and methods
US10452627B2 (en) Column weight calculation for data deduplication
US20210250327A1 (en) Domain name processing systems and methods
CN108269122B (en) Advertisement similarity processing method and device
CN110929525B (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
JPWO2018159337A1 (en) Profile generation device, attack detection device, profile generation method, and profile generation program
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
WO2016130374A1 (en) Method and apparatus for assigning device fingerprints to internet devices
CN111090807A (en) Knowledge graph-based user identification method and device
CN108763961B (en) Big data based privacy data grading method and device
CN110570199A (en) User identity detection method and system based on user input behaviors
CN105164676A (en) Query features and questions
CN113807073B (en) Text content anomaly detection method, device and storage medium
Karkali et al. Using temporal IDF for efficient novelty detection in text streams
CN110457707B (en) Method and device for extracting real word keywords, electronic equipment and readable storage medium
CN113688240A (en) Threat element extraction method, device, equipment and storage medium
CN111988327B (en) Threat behavior detection and model establishment method and device, electronic equipment and storage medium
CN111427883A (en) Data processing method and device based on AeroPike, computer equipment and storage medium
WO2019235074A1 (en) Generation method, generation device, and generation program
CN109242690A (en) Finance product recommended method, device, computer equipment and readable storage medium storing program for executing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant