CN116932345A - User operation behavior detection method and device - Google Patents

User operation behavior detection method and device Download PDF

Info

Publication number
CN116932345A
CN116932345A CN202210374083.5A CN202210374083A CN116932345A CN 116932345 A CN116932345 A CN 116932345A CN 202210374083 A CN202210374083 A CN 202210374083A CN 116932345 A CN116932345 A CN 116932345A
Authority
CN
China
Prior art keywords
user
feature
operation behavior
sample
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210374083.5A
Other languages
Chinese (zh)
Inventor
冮凯旋
刘冬岩
徐金阳
高琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Liaoning Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Liaoning Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Liaoning Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210374083.5A priority Critical patent/CN116932345A/en
Publication of CN116932345A publication Critical patent/CN116932345A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a method and a device for detecting user operation behaviors. Wherein the method comprises the following steps: acquiring operation behavior data of a user operation terminal; determining a feature vector of a user according to the operation behavior data; the feature vector is a numerical vector; inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user; the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label; the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the second feature matrix has a dimension lower than the dimension of the first feature matrix. According to the embodiment of the application, the accuracy of the detection result can be ensured, and the detection efficiency can be ensured.

Description

User operation behavior detection method and device
Technical Field
The application belongs to the technical field of information security, and particularly relates to a method and a device for detecting user operation behaviors.
Background
With the development of information technology, internet systems are increasing, and users using the internet systems are increasing. The operation such as hacking and malicious operation authority improvement of a user brings great security risks to the internet system. In order to ensure the safety of the Internet system, judging the operation behavior habit of the user is important.
In the prior art, a free proxy internet protocol (Internet Protocol, IP) database can be constructed to detect abnormal operation of a user, that is, the login IP of an account is matched with the free proxy IP database, and if the matching is successful, the abnormal login can be determined. In the prior art, an association rule algorithm can be adopted to screen user operation behavior vectors which are larger than a correlation threshold in a data set, record the user operation behavior vectors as normal operation behavior vectors, update the normal operation behavior vectors to a user normal behavior library, match the user operation behavior vectors to be detected with vectors in the user normal behavior library, and treat the user operation behavior vectors as abnormal behavior output if the user operation behavior vectors are not successfully matched. In the prior art, a cluster analysis method can be adopted to divide a user group into a plurality of groups, and the user information with the number of people in the groups less than a preset threshold value is output as abnormal behaviors.
However, the detection of the user operation behavior by adopting the above method cannot ensure the accuracy of the detection result and the detection efficiency.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment, a computer readable storage medium and a computer program product for detecting user operation behaviors, which can ensure the accuracy of detection results and the detection efficiency.
In a first aspect, an embodiment of the present application provides a method for detecting a user operation behavior, where the method includes:
acquiring operation behavior data of a user operation terminal;
determining a feature vector of the user according to the operation behavior data; the characteristic vector is a numerical vector;
inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user;
the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label;
the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the second feature matrix has a dimension that is lower than the dimension of the first feature matrix.
In one possible implementation manner, the inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user specifically includes:
calculating the distances between the feature vector and a plurality of second user feature vector samples in the user operation behavior classification model to obtain a plurality of distance values; the second user feature vector sample is a numerical vector;
selecting feature vectors corresponding to the first K distance values from the distance values according to the sequence from small to large; the K is a positive integer;
and determining the operation behavior type of the feature vector with more occurrence times as the operation behavior type of the user in the feature vectors corresponding to the first K distance values.
In one possible implementation manner, before the feature vector is input to the user operation behavior classification model to obtain the operation behavior type of the user, the method further includes:
acquiring a plurality of first user feature vector samples;
combining the plurality of first user feature vector samples into a first feature matrix sample;
calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample to obtain a second feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the dimension of the second feature matrix sample is lower than that of the first feature matrix sample;
and performing model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label to obtain a user operation behavior classification model.
In one possible implementation manner, the determining the feature vector of the user according to the operation behavior data specifically includes:
constructing a feature set of the user according to the operation behavior data;
traversing the features in the feature set through a third user feature vector sample; the third user feature vector sample is a text type vector;
when the features are consistent, the value of the features in the feature set is 1;
and when the features are inconsistent, taking the value of the features in the feature set as 0 to obtain the feature vector of the user.
In a possible implementation manner, before traversing the features in the feature set through a third user feature vector sample, the method further includes:
acquiring an operation behavior data sample of a user operation terminal;
constructing a user characteristic set sample according to the operation behavior data sample;
and performing de-duplication processing on the features in the user feature set sample to obtain a third user feature vector sample.
In a second aspect, an embodiment of the present application provides a device for detecting a user operation behavior, where the device includes:
the first acquisition module is used for acquiring operation behavior data of the user operation terminal;
the determining module is used for determining a first feature vector of the user according to the operation behavior data; the first feature vector is a numerical vector;
the input module is used for inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user;
the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label;
the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the second feature matrix has a dimension that is lower than the dimension of the first feature matrix.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the method of any one of the possible implementation methods of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method according to any one of the possible implementation methods of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, instructions in which, when executed by a processor of an electronic device, cause the electronic device to perform a method as in any of the possible implementation methods of the first aspect described above.
According to the detection method, the device, the equipment, the computer-readable storage medium and the computer program product for the user operation behavior, the second feature matrix sample is obtained by calculating the product of the projection matrix of the first feature matrix sample and the first feature matrix sample, the first feature matrix sample can be mapped into the feature space with the maximum information amount, the dimension of the feature matrix sample can be reduced, and the validity of information in the feature matrix sample can be guaranteed. On the basis, through carrying out model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label, the detection efficiency of the model and the accuracy of the detection result can be improved relative to carrying out model training according to the mapping relation between the first feature matrix sample and the user operation behavior type label. Therefore, the operation behavior type of the user is obtained by inputting the characteristic vector of the user into the user operation behavior classification model, so that the accuracy of the detection result can be ensured, and the detection efficiency can be ensured.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present application, the drawings that are needed to be used in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.
Fig. 1 is a flow chart of a method for detecting user operation behavior according to an embodiment of the present application;
FIG. 2 is a flowchart of another method for detecting user operation behavior according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a detection device for user operation behavior according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the particular embodiments described herein are meant to be illustrative of the application only and not limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the application by showing examples of the application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
As described in the background section, in order to ensure the security of the internet system, it is important to determine the operation behavior habit of the user. However, the abnormal detection of the user operation behavior is performed in the mode in the prior art, so that the accuracy of the detection result and the detection efficiency cannot be guaranteed.
The method is characterized in that a free proxy IP database is constructed to detect abnormal operation behaviors of users, and the method is essentially a white box abnormality detection method, the detection rules are transparent and can be easily bypassed by malicious users, and effectiveness is lost. In addition, the abnormal operation behavior of the user is detected by adopting a correlation rule algorithm, and the effect may not be obvious under the condition that the data set is small. With the increase of data sets to be detected, the normal behavior library of the user is increased, vectors in the whole library need to be matched for each detection, and the detection time is longer and longer. Moreover, the abnormal operation behavior of the user is detected by adopting a cluster analysis method, the setting of the threshold value is dependent, and if the setting of the threshold value is unreasonable, the expected effect is difficult to achieve.
To solve the problems in the prior art, embodiments of the present application provide a method, an apparatus, a device, a computer readable storage medium, and a computer program product for detecting user operation behavior.
The following first describes a method for detecting user operation behavior provided by the embodiment of the present application.
Fig. 1 is a flow chart illustrating a method for detecting user operation behavior according to an embodiment of the present application. As shown in fig. 1, the method for detecting user operation behavior provided by the embodiment of the application includes the following steps:
s110, acquiring operation behavior data of a user operation terminal;
s120, determining a characteristic vector of a user according to the operation behavior data; the feature vector is a numerical vector;
s130, inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user; the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label; the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the second feature matrix has a dimension lower than the dimension of the first feature matrix.
According to the detection method for the user operation behavior, the second feature matrix sample is obtained by calculating the product of the projection matrix of the first feature matrix sample and the first feature matrix sample, the first feature matrix sample can be mapped into the feature space with the maximum information amount, the dimension of the feature matrix sample can be further reduced, and the validity of information in the feature matrix sample can be guaranteed. On the basis, through carrying out model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label, the detection efficiency of the model and the accuracy of the detection result can be improved relative to carrying out model training according to the mapping relation between the first feature matrix sample and the user operation behavior type label. Therefore, the operation behavior type of the user is obtained by inputting the characteristic vector of the user into the user operation behavior classification model, so that the accuracy of the detection result can be ensured, and the detection efficiency can be ensured.
A specific implementation of each of the above steps is described below.
In some embodiments, in S110, the user may be any user using the operation terminal, for example, an ordinary user, an operation and maintenance person, an administrator, or the like. That is, the operation behavior may be initiated by a user of an ordinary user, an operation and maintenance person, an administrator, or the like. The operable terminal can be a desktop computer, a notebook computer, a tablet computer and the like. Since the operable terminal can be connected to the host, the operation behavior data of the user operable terminal can be a series of operation behavior data generated by the user logging into the host.
As an example, since a series of operation behaviors generated by a user logging into a host can be collected in a log server in real time, operation behavior data of the user can be obtained through the log server. Wherein, the user operation behavior data including information such as the host operation log and the database operation log can be obtained from the log server. In addition, the log server may be a syslog log server.
For example, when a general user logs in to a host computer of 10 to view the content under the specified work directory, the syslog log server may record a bar such as the following operation log:
user name: class= "host_command" type= "3" time= "2020-12-0411:39:05" src_ip= "10..times.," dst_ip= "10..times.," src_port= "" dst_port= "" protocol= "" start_time= "" end_time= "" primary u "," src_port= ", and" start_time= ", respectively. User=" "second_user=" "user name" operation= "ls" content= "COMMAND" dev_ip= "" dev_port= "" dev_mac= "" authen_status= "" log_level= "1" session_id= "" parameter_len= "" parameter= ".
For another example, the administrator logs on to a 10× host and changes the user's password, and then the syslog log server may record a bar such as the following log:
user name: class= "host_command" type= "3" time= "2020-12-0415:09:08" src_ip= "10..times.," dst_ip= "10..times.," src_port= "" dst_port= "" protocol= "" start_time= "" end_time= "" primary_user= "; the user name" second_user= "user name" operation= "2020-12-04 15:08:59passwd"content = "COMMAND" dev_ip= "" dev_port= "" dev_mac= "" authen_status= "" log_level= "1" session_id= "" param_len= "" param= ".
On this basis, the operation behavior data of the user may refer specifically to the content in the "operation" field in the operation log described above.
In some embodiments, in S120, the operation behavior data of the user may be text-type data, and the feature vector may be a numeric-type vector.
In some embodiments, determining the feature vector of the user according to the operation behavior data may specifically include:
constructing a feature set of a user according to the operation behavior data;
traversing the features in the feature set through a third user feature vector sample; the third feature vector sample is a text type vector;
when the features are consistent, the value of the features in the feature set is 1;
and when the features are inconsistent, the feature value in the feature set is 0, so that the feature vector of the user is obtained.
Here, the operation behavior data may refer specifically to the content in the "operation" field in the operation log described above. The content in the operation field in all operation logs of the user can be extracted by utilizing the big data technology, and the characteristic set of the current user is formed.
For example, a feature set of a user may be:
the user: ([ ' ls ', ' test ', ' lp ', ' cp ', ' sh ', ' date ', ' cat ', ' lp ', ' chmod ', ' mkdir ', ' ls ', ' term > ].
In addition, the third user feature vector sample is a pre-trained feature vector sample. Wherein the features in the third user feature vector sample are all unique. By comparing each feature in the feature set with the feature in the third user feature vector sample, the feature in the feature set can be taken as "1" when the features are consistent, and the feature in the feature set can be taken as "0" when the features are inconsistent. Further, the feature vector of the user can be obtained. Wherein, the feature vector can have only two elements of 0 and 1.
In this way, by converting text-type data into numerical-type data, a numerical vector corresponding to the operation behavior data of the user, that is, a feature vector can be constructed. Further, by inputting the feature vector to the user operation behavior classification model, the operation behavior type of the user can be obtained.
In some embodiments, before traversing the features in the feature set through the third user feature vector sample, further comprising:
acquiring an operation behavior data sample of a user operation terminal;
constructing a user characteristic set sample according to the operation behavior data sample;
and performing de-duplication processing on the features in the user feature set sample to obtain a third user feature vector sample.
Here, the process of constructing the user feature set sample from the operation behavior data sample may be identical to the process of constructing the user feature set from the operation behavior data described above, and will not be described in detail herein. After the user feature set sample is obtained, the features in the user feature set sample can be subjected to de-duplication processing to obtain a third user feature vector sample. Wherein the features in the third user feature vector sample are all unique.
For example, the user feature set sample may be:
the user: ([ ' ls ', ' test ', ' lp ', ' cp ', ' sh ', ' date ', ' cat ', ' lp ', ' chmod ', ' mkdir ', ' ls ', ' term > ].
Wherein the features 'ls' and 'lp' occur more than once respectively. Thus, after performing the deduplication processing on the features in the user feature set sample, the third user feature vector sample may be:
the user: ([ ' ls ', ' test ', ' lp ', ' cp ', ' sh ', ' date ', ' cat ', ' chmod ', ' mkdir ', ' term ', ' j ].
In this way, by performing deduplication processing on the features in the user feature set samples, a large vector space can be formed, thereby providing a sample basis for determining the feature vectors of the user.
In some embodiments, in S130, inputting the feature vector into the user operation behavior classification model to obtain the operation behavior type of the user may specifically include:
calculating the distances between the feature vectors and a plurality of second user feature vector samples in the user operation behavior classification model to obtain a plurality of distance values; the second user feature vector sample is a numerical vector;
selecting feature vectors corresponding to the first K distance values from the distance values according to the sequence from small to large; k is a positive integer;
and determining the operation behavior type of the feature vector with more occurrence times as the operation behavior type of the user in the feature vectors corresponding to the first K distance values.
Here, the calculated distance value may be manhattan distance, euclidean distance, minkowski distance, or the like.
As an example, the feature vector may be represented as X 1N =(x 11 ,x 12 ,…,x 1N ) The second user feature vector sample may be represented as Y 1N =(y 11 ,y 12 ,…,y 1N ). The distance formula between the calculated feature vector and the second user feature vector sample in the user operation behavior classification model may be:
as an example, after obtaining a plurality of distance values, the distance values may be saved and sorted in order from small to large, with the first K distance values being selected. And determining the operation behavior type of the feature vector with more occurrence times as the operation behavior type of the user in the feature vectors corresponding to the first K distance values.
In this way, the operation behavior type of the feature vector having a large number of occurrences is determined as the operation behavior type of the user from among the feature vectors corresponding to the first K distance values, so that the operation behavior type of the user can be accurately determined.
In order to obtain the classification model of the user operation behaviors, as another implementation manner of the application, the application also provides another implementation manner of the detection method of the user operation behaviors, and the detection method is specifically referred to the following embodiments.
Referring to fig. 2, the method for detecting user operation behavior according to the embodiment of the present application may further include the following steps before S130 shown in the foregoing embodiment:
s210, acquiring a plurality of first user feature vector samples;
s220, combining a plurality of first user feature vector samples into a first feature matrix sample;
s230, calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample to obtain a second feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the dimension of the second feature matrix sample is lower than that of the first feature matrix sample;
s240, performing model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label to obtain a user operation behavior classification model.
Therefore, the second feature matrix sample is obtained by calculating the product of the projection matrix of the first feature matrix sample and the first feature matrix sample, the first feature matrix sample can be mapped into the feature space with the maximum information amount, the dimension of the feature matrix sample can be reduced, and the validity of information in the feature matrix sample can be ensured. On the basis, through carrying out model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label, the detection efficiency of the model and the accuracy of the detection result can be improved relative to carrying out model training according to the mapping relation between the first feature matrix sample and the user operation behavior type label.
A specific implementation of each of the above steps is described below.
In some embodiments, in S210, the first user feature vector sample may be a numeric vector sample.
As an example, the process of obtaining the plurality of first user feature vector samples may be the same as the process of determining the feature vector of the user from the operational behavior data hereinabove, and will not be described in detail herein.
In some embodiments, in S220, a plurality of first user feature vector samples may be combined as row vectors and form a first feature matrix sample.
As an example, the first feature matrix sample may be denoted as X nN_train
Where N may represent the number of first user feature vector samples and N may represent the number of features in the first user feature vector samples.
In some embodiments, in S230, in order to reduce the dimension of the first feature matrix sample, the first feature matrix sample may be mapped into a feature space having the largest amount of information. That is, the projection matrix of the first feature matrix sample may be calculated first. After obtaining the projection matrix of the first feature matrix sample, the product of the projection matrix and the first feature matrix sample may be calculated to obtain a mapped feature matrix sample, i.e., a second feature matrix sample.
As an example, calculate X nN_train The specific process of the projection matrix of (a) can be as follows:
(1) The decentration, i.e., each bit feature minus the respective average value.
(2) Calculating covariance matrixIs described.
(3) Sorting characteristic values from large to small, selecting accumulation sum to occupy more than 95% of informationEigenvaluesAnd then respectively forming a projection matrix P by using the corresponding eigenvectors as row vectors.
As an example, the second feature matrix sample may be denoted as Z. X is to be nN_train The conversion to a new space, obtaining a second feature matrix sample, may be: z=px nN_train
In some embodiments, in S240, after the second feature matrix sample is obtained, model training may be performed according to a mapping relationship between the second feature matrix sample and the user operation behavior type label, to obtain a user operation behavior classification model.
It should be noted that, in the embodiment of the present application, the process of training the classification model of the user operation behavior may be a process of constructing the classification model of the user operation behavior based on the PCKNN algorithm. The PCKNN algorithm is a novel algorithm provided by the embodiment of the application. The PCKNN algorithm may be an optimization of the K-nearest neighbor value algorithm (KNN algorithm), but with substantial differences.
The differences between the two are illustrated below by mathematical formulas:
the KNN algorithm calculates the distance between the unknown input and the known input, classifies the unknown input into the most-to-most class of K nearest samples according to the voting rule of minority compliance, and does not perform any judgment and processing on the unknown input. The PCKNN algorithm firstly calculates a projection matrix of unknown input, maps the projection matrix into a feature space with the maximum information quantity, calculates the distance between the unknown input and the known input, and classifies the unknown input into the most opposite type in K nearest neighbor samples according to a voting rule of a few obeying most. The PCKNN algorithm increases the concept of projection matrix and mapping over the KNN algorithm in terms of algorithm implementation. Thus, the PCKNN algorithm is essentially different from the KNN algorithm, and the former performs better in terms of classification accuracy and efficiency.
Based on the method for detecting the user operation behavior provided by the embodiment, correspondingly, the application further provides a specific implementation mode of the device for detecting the user operation behavior. Please refer to the following examples.
As shown in fig. 3, the detection device 300 for user operation behavior provided in the embodiment of the present application includes the following modules:
a first obtaining module 310, configured to obtain operation behavior data of a user operation terminal;
a determining module 320, configured to determine a first feature vector of the user according to the operation behavior data; the first feature vector is a numerical vector;
the input module 330 is configured to input the feature vector to the user operation behavior classification model, so as to obtain an operation behavior type of the user; the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label; the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples include a plurality of second user feature vector samples, and the second feature matrix has a dimension lower than the dimension of the first feature matrix.
As an implementation of the present application, the input module 330 may specifically include:
the computing sub-module is used for computing the distances between the feature vector and a plurality of second user feature vector samples in the user operation behavior classification model to obtain a plurality of distance values; the second user feature vector sample is a numerical vector;
the selecting sub-module is used for selecting the feature vectors corresponding to the first K distance values from the distance values according to the sequence from small to large; k is a positive integer;
the determining submodule is used for determining the operation behavior type of the feature vector with more occurrence times as the operation behavior type of the user in the feature vectors corresponding to the first K distance values.
As an implementation manner of the present application, the apparatus may further include:
the second acquisition module is used for acquiring a plurality of first user feature vector samples;
a combining module, configured to combine the plurality of first user feature vector samples into a first feature matrix sample;
the computing module is used for computing the product of the projection matrix of the first feature matrix sample and the first feature matrix sample to obtain a second feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the dimension of the second feature matrix sample is lower than that of the first feature matrix sample;
and the training module is used for carrying out model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label to obtain a user operation behavior classification model.
As an implementation manner of the present application, the determining module 320 may specifically include:
the first construction submodule is used for constructing a characteristic set of a user according to the operation behavior data;
a traversing sub-module, configured to traverse the features in the feature set through a third user feature vector sample; the third feature vector sample is a text type vector;
the first value sub-module is used for taking the characteristic value in the characteristic set as 1 when the characteristics are consistent;
and the second value sub-module is used for taking the value of the feature in the feature set as 0 when the features are inconsistent, so as to obtain the feature vector of the user.
As an implementation manner of the present application, the determining module 320 may specifically further include:
the acquisition sub-module is used for acquiring an operation behavior data sample of the user operation terminal;
the second construction submodule is used for constructing a user characteristic set sample according to the operation behavior data sample;
and the de-duplication sub-module is used for de-duplication processing the features in the user feature set sample to obtain a third user feature vector sample.
According to the detection device for the user operation behavior, the second feature matrix sample is obtained by calculating the product of the projection matrix of the first feature matrix sample and the first feature matrix sample, the first feature matrix sample can be mapped into the feature space with the maximum information amount, the dimension of the feature matrix sample can be further reduced, and the validity of information in the feature matrix sample can be guaranteed. On the basis, through carrying out model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label, the detection efficiency of the model and the accuracy of the detection result can be improved relative to carrying out model training according to the mapping relation between the first feature matrix sample and the user operation behavior type label. Therefore, the operation behavior type of the user is obtained by inputting the characteristic vector of the user into the user operation behavior classification model, so that the accuracy of the detection result can be ensured, and the detection efficiency can be ensured.
Based on the detection method of the user operation behavior provided by the embodiment, the embodiment of the application also provides a specific implementation mode of the electronic equipment. Fig. 4 shows a schematic diagram of an electronic device 400 according to an embodiment of the application.
The electronic device 400 may include a processor 410 and a memory 420 storing computer program instructions.
In particular, the processor 410 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.
Memory 420 may include mass storage for data or instructions. By way of example, and not limitation, memory 420 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 420 may include removable or non-removable (or fixed) media, where appropriate. Memory 420 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 420 is a non-volatile solid state memory.
The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to a method in accordance with an aspect of the application.
The processor 410 implements the detection method of any one of the user operation behaviors of the above embodiments by reading and executing the computer program instructions stored in the memory 420.
In one example, electronic device 400 may also include communication interface 430 and bus 440. As shown in fig. 4, the processor 410, the memory 420, and the communication interface 430 are connected and communicate with each other through a bus 440.
The communication interface 430 is mainly used to implement communication between each module, device, unit and/or apparatus in the embodiment of the present application.
Bus 440 includes hardware, software, or both that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 440 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The electronic device can execute the method for detecting the user operation behaviors in the embodiment of the application based on the operation behavior data of the user operation terminal which is acquired currently, thereby realizing the method and the device for detecting the user operation behaviors described in connection with fig. 1 to 3.
In addition, in combination with the method for detecting the user operation behavior in the above embodiment, the embodiment of the present application may be implemented by providing a computer storage medium. The computer storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement a method of detecting any of the user's operational behaviors of the above embodiments.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present application, and they should be included in the scope of the present application.

Claims (9)

1. A method for detecting user operation behavior, comprising:
acquiring operation behavior data of a user operation terminal;
determining a feature vector of the user according to the operation behavior data; the characteristic vector is a numerical vector;
inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user;
the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label;
the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the second feature matrix has a dimension that is lower than the dimension of the first feature matrix.
2. The method for detecting a user operation behavior according to claim 1, wherein the inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user specifically includes:
calculating the distances between the feature vector and a plurality of second user feature vector samples in the user operation behavior classification model to obtain a plurality of distance values; the second user feature vector sample is a numerical vector;
selecting feature vectors corresponding to the first K distance values from the distance values according to the sequence from small to large; the K is a positive integer;
and determining the operation behavior type of the feature vector with more occurrence times as the operation behavior type of the user in the feature vectors corresponding to the first K distance values.
3. The method for detecting operation behaviors of a user according to claim 1, wherein before inputting the feature vector into a classification model of operation behaviors of the user to obtain the operation behavior type of the user, the method further comprises:
acquiring a plurality of first user feature vector samples;
combining the plurality of first user feature vector samples into a first feature matrix sample;
calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample to obtain a second feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the dimension of the second feature matrix sample is lower than that of the first feature matrix sample;
and performing model training according to the mapping relation between the second feature matrix sample and the user operation behavior type label to obtain a user operation behavior classification model.
4. The method for detecting operation behaviors of a user according to claim 1, wherein the determining the feature vector of the user according to the operation behavior data specifically includes:
constructing a feature set of the user according to the operation behavior data;
traversing the features in the feature set through a third user feature vector sample; the third user feature vector sample is a text type vector;
when the features are consistent, the value of the features in the feature set is 1;
and when the features are inconsistent, taking the value of the features in the feature set as 0 to obtain the feature vector of the user.
5. The method of claim 4, wherein before traversing the features in the feature set through a third user feature vector sample, the method further comprises:
acquiring an operation behavior data sample of a user operation terminal;
constructing a user characteristic set sample according to the operation behavior data sample;
and performing de-duplication processing on the features in the user feature set sample to obtain a third user feature vector sample.
6. A device for detecting user operation behavior, the device comprising:
the first acquisition module is used for acquiring operation behavior data of the user operation terminal;
the determining module is used for determining a first feature vector of the user according to the operation behavior data; the first feature vector is a numerical vector;
the input module is used for inputting the feature vector into a user operation behavior classification model to obtain the operation behavior type of the user;
the user operation behavior classification model is obtained through training of a mapping relation between the second feature matrix sample and the user operation behavior type label;
the second feature matrix sample is obtained by combining a plurality of first user feature vector samples into a first feature matrix sample and calculating the product of a projection matrix of the first feature matrix sample and the first feature matrix sample; the second feature matrix samples comprise a plurality of second user feature vector samples; the second feature matrix has a dimension that is lower than the dimension of the first feature matrix.
7. An electronic device, the device comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a method for detecting user operation behavior according to any one of claims 1-5.
8. A computer readable storage medium, wherein computer program instructions are stored on the computer readable storage medium, which when executed by a processor implement a method of detecting user operation behavior according to any of claims 1-5.
9. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the method of detecting user operation behavior according to any of claims 1-5.
CN202210374083.5A 2022-04-11 2022-04-11 User operation behavior detection method and device Pending CN116932345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210374083.5A CN116932345A (en) 2022-04-11 2022-04-11 User operation behavior detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210374083.5A CN116932345A (en) 2022-04-11 2022-04-11 User operation behavior detection method and device

Publications (1)

Publication Number Publication Date
CN116932345A true CN116932345A (en) 2023-10-24

Family

ID=88391376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210374083.5A Pending CN116932345A (en) 2022-04-11 2022-04-11 User operation behavior detection method and device

Country Status (1)

Country Link
CN (1) CN116932345A (en)

Similar Documents

Publication Publication Date Title
CN107992490B (en) Data processing method and data processing equipment
CN113918376B (en) Fault detection method, device, equipment and computer readable storage medium
CN106685964B (en) Malicious software detection method and system based on malicious network traffic thesaurus
WO2022199185A1 (en) User operation inspection method and program product
CN113032792A (en) System service vulnerability detection method, system, equipment and storage medium
CN113098887A (en) Phishing website detection method based on website joint characteristics
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
CN111064719B (en) Method and device for detecting abnormal downloading behavior of file
WO2023035362A1 (en) Polluted sample data detecting method and apparatus for model training
CN115705413A (en) Method and device for determining abnormal log
CN113962324A (en) Picture detection method and device, storage medium and electronic equipment
CN113434855A (en) Security event processing method and device and readable storage medium
CN116738369A (en) Traffic data classification method, device, equipment and storage medium
CN112688897A (en) Traffic identification method and device, storage medium and electronic equipment
CN116932345A (en) User operation behavior detection method and device
CN106685963B (en) Establishment method and establishment system of malicious network traffic word stock
CN114741690A (en) Network security monitoring method, device, equipment and computer storage medium
CN117391214A (en) Model training method and device and related equipment
CN112464218B (en) Model training method and device, electronic equipment and storage medium
CN111291370B (en) Network data intrusion detection method, system, terminal and storage medium
CN111353015B (en) Crowd-sourced question recommendation method, device, equipment and storage medium
CN112398794B (en) Method, device, equipment and storage medium for detecting network abnormal behavior
CN116775889B (en) Threat information automatic extraction method, system, equipment and storage medium based on natural language processing
CN113347021B (en) Model generation method, collision library detection method, device, electronic equipment and computer readable storage medium
CN116150697A (en) Abnormal application identification method, device, equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination