CN109918279B - Electronic device, method for identifying abnormal operation of user based on log data and storage medium - Google Patents
Electronic device, method for identifying abnormal operation of user based on log data and storage medium Download PDFInfo
- Publication number
- CN109918279B CN109918279B CN201910065654.5A CN201910065654A CN109918279B CN 109918279 B CN109918279 B CN 109918279B CN 201910065654 A CN201910065654 A CN 201910065654A CN 109918279 B CN109918279 B CN 109918279B
- Authority
- CN
- China
- Prior art keywords
- user
- abnormal
- users
- characteristic data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Testing And Monitoring For Control Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for identifying abnormal operation of a user by an electronic device based on log data and a storage medium, which comprises the steps of firstly, collecting the log data of a plurality of predetermined users, and carrying out statistical analysis on the collected log data to respectively obtain the operation characteristic data of the plurality of predetermined users; then, analyzing the acquired operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the plurality of pre-determined users; and finally, sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor the abnormal user or check the abnormal user. The abnormal operation of the user can be quickly and accurately identified, and the accuracy of identifying the abnormal user is improved.
Description
Technical Field
The present invention relates to the field of abnormal operation identification, and in particular, to an electronic device, a method for identifying abnormal operation of a user based on log data, and a storage medium.
Background
At present, there are many kinds of user behavior pattern recognition, and most of application system operation layers adopt a rule setting mode to recognize the monitoring of a user on a certain specific object or operation total number, and the dimensionality is single; on the other hand, the application system operation log is usually applied to monitoring the system health state, and a complete and systematic application method is lacked in the user operation level.
Disclosure of Invention
In view of the above, in order to solve the above technical problem, the present invention first provides an electronic device, which includes a memory and a processor connected to the memory, wherein the processor is configured to execute a program stored in the memory and used for identifying a user abnormal operation based on log data, and when the program is executed by the processor, the following steps are implemented:
a1, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain the predetermined operation characteristic data of the plurality of users;
a2, analyzing the acquired operation characteristic data according to a classification model of the pre-established abnormal user identification to determine an abnormal user from the plurality of pre-determined users;
and A3, sending the identity identification information of the abnormal user to a predetermined abnormal user monitoring center to monitor or check the abnormal user.
Preferably, in the step a2, the process of building the pre-established abnormal user identification classification model includes the following steps:
analyzing the acquired operation characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users;
based on the determined operation characteristic data of the abnormal user, selecting key characteristic parameters for constructing a classification model from a plurality of characteristic parameters of the abnormal user in a supervised learning mode, and generating key characteristic data containing the key characteristic parameters;
and constructing a decision tree model by using the key characteristic data, wherein the decision tree model is a classification model for identifying abnormal users.
Preferably, the step of analyzing the acquired operational characteristic data according to an unsupervised machine learning algorithm to determine an abnormal user from the predetermined plurality of users comprises:
clustering the operation characteristic data of a plurality of users, and aggregating the operation characteristic data of the users with high association degree to obtain a plurality of clusters;
respectively judging the distribution of each operation characteristic data in each cluster, and if the operation characteristic data contained in one cluster is less than a first preset quantity, considering the user in the cluster as an abnormal user;
if the number of the operation characteristic data contained in one cluster is greater than or equal to the first preset number and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is greater than or equal to a second preset number, the user in the cluster is considered to be an abnormal user;
or if the operation characteristic data included in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is less than the second preset number, the user corresponding to the operation characteristic data with the distance from the central data greater than the predefined distance threshold in the cluster is considered as an abnormal user.
Preferably, the supervised learning manner is a decision tree algorithm or a naive bayes algorithm.
Preferably, the operation characteristic data includes data information such as a user name, a login IP, time, an operation event, and a parameter of the operation user.
In addition, in order to solve the above technical problem, the present invention further provides a method for identifying an abnormal operation of a user based on log data, wherein the method includes the following steps:
s1, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain the predetermined operation characteristic data of the plurality of users;
s2, analyzing the obtained operation characteristic data according to a pre-established classification model for abnormal user identification so as to determine an abnormal user from the plurality of pre-determined users;
and S3, sending the determined identification information of the abnormal user to a predetermined abnormal user monitoring center to monitor or check the abnormal user.
Preferably, in the step S2, the process of building the pre-established abnormal user identification classification model includes the following steps:
analyzing the acquired operation characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users;
based on the determined operation characteristic data of the abnormal user, selecting key characteristic parameters for constructing a classification model from a plurality of characteristic parameters of the abnormal user in a supervised learning mode, and generating key characteristic data containing the key characteristic parameters;
and constructing a decision tree model by using the key characteristic data, wherein the decision tree model is a classification model for identifying abnormal users.
Preferably, the step of analyzing the acquired operational characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users comprises:
clustering the operation characteristic data of a plurality of users, and aggregating the operation characteristic data of the users with high association degree to obtain a plurality of clusters;
respectively judging the distribution of each operation characteristic data in each cluster, and if the operation characteristic data contained in one cluster is less than a first preset number, determining that the user in the cluster is an abnormal user;
if the operation characteristic data contained in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is greater than or equal to the second preset number, the user in the cluster is considered as an abnormal user;
or if the operation characteristic data contained in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is less than the second preset number, determining that the user corresponding to the operation characteristic data with the distance from the central data greater than the predefined distance threshold in the cluster is an abnormal user.
Preferably, the supervised learning manner is a decision tree algorithm or a naive bayes algorithm.
In addition, to solve the above technical problem, the present invention further provides a computer-readable storage medium, where the virtual number based monitoring and surveying program is stored, and the program for identifying a user abnormal operation based on log data is executable by at least one processor, so that the at least one processor executes the steps of the method for identifying a user abnormal operation based on log data according to any one of the above described methods.
According to the electronic device, the method for identifying the abnormal operation of the user based on the log data and the storage medium, firstly, the log data of a plurality of predetermined users are collected, and the collected log data are subjected to statistical analysis to respectively obtain the operation characteristic data of the plurality of predetermined users; then, analyzing the acquired operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the plurality of pre-determined users; and finally, sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor or check the abnormal user. The abnormal operation of the user can be quickly and accurately identified, and the accuracy of identifying the abnormal user is improved.
Drawings
FIG. 1 is a diagram of an alternative hardware architecture of an electronic device according to the present invention;
FIG. 2 is a block diagram of a program for identifying abnormal operation of a user based on log data according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating an embodiment of a method for identifying abnormal operation of a user based on log data.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between the embodiments may be combined with each other, but must be based on the realization of the technical solutions by a person skilled in the art, and when the technical solutions are contradictory to each other or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Fig. 1 is a schematic diagram of an alternative hardware architecture of the electronic device according to the present invention. In this embodiment, the electronic device 10 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other by a communication bus 14. It is noted that fig. 1 only shows the electronic device 10 with components 11-14, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 11 includes at least one type of computer-readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 11 may be an internal storage unit of the electronic device 10, such as a hard disk or a memory of the electronic device 10. In other embodiments, the memory 11 may also be an external storage device of the electronic apparatus 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic apparatus 10. Of course, the memory 11 may also include both internal and external storage devices of the electronic apparatus 10. In the present embodiment, the memory 11 is generally used for storing an operating system installed in the electronic device 10 and various application software, such as a program for identifying abnormal operation of a user based on log data. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.
The network interface 13 may include a wireless network interface or a wired network interface, and the network interface 13 is generally used to establish a communication connection between the electronic apparatus 10 and other electronic devices.
The communication bus 14 is used to enable communication connections between the components 11-13.
Fig. 1 only shows the electronic device 10 with components 11-14 and a program for identifying user abnormal operation based on log data, but it should be understood that not all of the shown components are required to be implemented, and more or less components may be implemented instead.
Optionally, the electronic device 10 may further comprise a user interface (not shown in fig. 1), which may comprise a display, an input unit such as a keyboard, wherein the user interface may further comprise a standard wired interface, a wireless interface, etc.
Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED touch, and the like. Further, the display may also be referred to as a display screen or display unit for displaying user interfaces for processing information in the electronic device 10 and for displaying visualizations.
Optionally, in some embodiments, the electronic device 10 may further include an audio unit (audio unit not shown in fig. 1) that may convert received or stored audio data into an audio signal when the electronic device 10 is in a call signal reception mode, a talk mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like; further, the electronic device 10 may further include an audio output unit that outputs the audio signal converted by the audio unit, and the audio output unit may also provide audio output related to a specific function performed by the electronic device 10 (e.g., a call signal receiving sound, a message receiving sound, etc.), and the audio output unit may include a speaker, a buzzer, etc.
Optionally, in some embodiments, the electronic device 10 may further include an alarm unit (not shown in the figures) that may provide an output to notify the electronic device 10 that the occurrence of the event has been notified. Typical events may include call reception, message reception, key signal input, touch input, and the like. In addition to audio or visual output, the alarm unit may provide output in different ways to notify the occurrence of an event. For example, the alert unit may provide an output in the form of a vibration, and when a call, a message, or some other input that may cause the electronic device 10 to enter a communication mode, the alert unit may provide a tactile output (i.e., a vibration) to notify the user thereof.
In one embodiment, the program stored in the memory 11 for identifying the abnormal operation of the user based on the log data when executed by the processor 12 implements the following operations:
A. collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to obtain operation characteristic data of the users;
specifically, the collected log data of the user includes data information such as a user name, a login IP, time, an operation event, and parameters of the operation user; since the analysis of the abnormal operation of the user is performed based on the operation characteristics of the user in the log data of the user, the log data of the user needs to be collected, and the operation characteristic data containing the user is obtained from the collected log data, specifically, the operation characteristic data of the user is a plurality of characteristic parameters for identifying or recording the operation behavior of the user, where the operation characteristic parameters may be characteristic parameters obtained by taking the number of times that the determined user performs the operation within a predefined time period (for example, each hour of a determined working day or each hour of a non-working day) as a dimension, counted from the operation characteristic data of the user, or may be characteristic parameters obtained by taking the number of times that the determined operation is performed within a predefined time period (for example, each hour of a determined working day or each hour of a non-working day) as a dimension, the number of IPs used within a predefined time period, etc.
B. Analyzing the obtained operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the predetermined multiple users;
C. and sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor or check the abnormal user.
Specifically, in some optional implementations of this embodiment, the process of building the pre-established classification model for the abnormal user recognition includes the following steps:
e1, analyzing the acquired operation characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined multiple users;
specifically, clustering the characteristic data of a plurality of users by adopting a clustering algorithm to obtain a plurality of clusters; when the cluster contains characteristic data which is far away from the whole data center or forms discrete scattered points, determining the user corresponding to the data characteristics far away from the data center or forming the discrete scattered points as an abnormal user.
In this embodiment, the unsupervised learning manner may be a clustering algorithm, for example, a distance-based clustering algorithm.
The operation characteristic data of a plurality of users can be clustered by adopting a clustering algorithm, and the operation characteristic data of the users with high association degree are aggregated to obtain a plurality of clusters. Each cluster may contain operation characteristic data of a plurality of users having a high degree of association. In this embodiment, the distribution of each operation characteristic data in each cluster may be respectively determined, and if only operation characteristic data smaller than a first preset number, for example, 2 scattered points, are distributed in one cluster, the operation characteristic data in the cluster is considered as the scattered points, and the user corresponding to the scattered points is an abnormal user; if the operating characteristic data with the quantity larger than the first preset quantity are distributed in one cluster, and most data in the cluster are far away from the central data, if the quantity of the operating characteristic data with the distance from the predefined central data larger than the predefined distance threshold value is larger than or equal to the second preset quantity, the whole cluster is considered as a cluster of an abnormal user; or if the operating characteristic data with the distance from the cluster to the predefined central data being greater than the predefined distance threshold is distributed in one cluster, and the number of the operating characteristic data with the distance from the cluster to the predefined central data being greater than the predefined distance threshold is smaller than the second preset number, the user corresponding to the operating characteristic data with the distance from the cluster to the central data being greater than the predefined distance threshold is considered to be an abnormal user.
F1, based on the determined operation characteristic data of the abnormal user, selecting key characteristic parameters for constructing a classification model from the characteristic parameters of the abnormal user in a supervised learning mode, and generating key characteristic data containing the key characteristic parameters;
specifically, in this embodiment, in order to construct a classification model for identifying an abnormal user, feature data of the abnormal user in the plurality of determined users may be analyzed in a supervised learning manner, and key feature parameters for constructing the classification model, that is, parameters important in identifying the abnormal user, may be selected from the feature parameters.
In this embodiment, the supervised learning approach may adopt a decision tree. Before the decision tree is used for selecting key characteristic parameters for constructing a classification model, the decision tree can be constructed by using the determined characteristic data of the abnormal user. The decision tree is trained by taking the feature data of a plurality of abnormal users as training samples, so that the decision tree can learn the importance degree of each feature parameter in the feature data of the abnormal users in identifying the abnormal users. And the decision tree constructed by the determined feature data of the abnormal user comprises a plurality of nodes, each node is used for one feature parameter, and the more important the feature parameters corresponding to the nodes which are closer to the root node of the decision tree are in identifying the abnormal user. The characteristic parameters corresponding to the nodes with the depth greater than the depth threshold value in the decision tree, namely the more important characteristic parameters, can be selected as the key characteristic parameters for constructing the classification model. For example, in this embodiment, taking the number of times that the user performs the operation in different preset time periods included in the feature data of the user as an example, the decision tree constructed by using the feature data of the abnormal user includes nodes corresponding to the number of times that the user performs the operation at each preset time point, and in the decision tree, according to the difference between the number of times that the user performs the operation at each preset time point and the importance degree of identifying the abnormal user, the depth of the nodes corresponding to the number of times that the user performs the operation in different time periods in the decision tree is also different. In this embodiment, after selecting the key feature parameters for constructing the classification model, that is, the operation times of the key time period, through the decision tree, the feature data of the abnormal user meeting the following conditions may be selected from the determined feature data of the abnormal user: and classifying the feature data of the abnormal user by the decision tree to obtain a classification result which is the abnormal user. Namely, the decision tree is adopted to classify the identified feature data of the abnormal users again to obtain a classification result. When the classification result of the feature data of the abnormal user by the decision tree is the abnormal user, the key feature parameters (namely, the execution times of the key time points) in the feature data of the abnormal user can be combined to obtain the key feature data, so that the classification model is constructed by using the key feature data.
In this embodiment, a naive bayes algorithm may also be employed in the supervised learning manner. And respectively calculating the abnormal probability corresponding to each characteristic parameter by adopting a naive Bayesian algorithm according to the characteristic data of the abnormal user determined, wherein the abnormal probability corresponding to the characteristic parameter is the probability that the user is the abnormal user when the numerical value of the characteristic parameter is abnormal. The anomaly probability may represent how important the feature parameter is in identifying anomalous users. The feature parameters corresponding to a higher probability of abnormality are more important for identifying an abnormality. After the anomaly probabilities corresponding to the feature parameters are respectively calculated through a naive Bayes algorithm, the feature parameters with the corresponding anomaly probabilities larger than a probability threshold can be used as key feature parameters for constructing a classification model. In this embodiment, after selecting the key feature parameters for constructing the classification model by the naive bayes algorithm, the feature data of the abnormal user meeting the following conditions can be selected from the determined feature data of the abnormal user: and classifying the feature data of the abnormal user by adopting a naive Bayes algorithm to obtain a classification result, wherein the classification result is the abnormal user. And classifying the identified feature data of the abnormal users again by adopting a naive Bayes algorithm to obtain a classification result. When the classification result of the feature data of the abnormal user by the naive Bayes algorithm is the abnormal user, the key feature parameters in the feature data of the abnormal user can be combined to obtain key feature data, so that a classification model is constructed by using the key feature data.
It should be noted that, in this embodiment, the key feature data is the execution times of the key time period, and in other embodiments, the key feature data may also be the number of IPs used in the key time period and the number of times of logging in the operating system; but also basic information of the user, such as age, academic calendar, occupation, etc., and in this embodiment, this is not a limitation.
G1, constructing a decision tree model by using the key feature data, wherein the decision tree model is a classification model for identifying the abnormal user.
Specifically, in the present embodiment, the classification model may be a decision tree model. A decision tree model may be created, and the generated key feature data including the key feature parameters is used as a training sample to train the decision tree model to obtain a trained classification model for identifying the abnormal user.
In view of the above, the electronic device provided in the present invention first receives the car insurance case information, and analyzes the car insurance case information according to the predefined scheduling rule to determine the surveyor corresponding to the surveying task of the case; then sending a request for acquiring a virtual number to a virtual number service platform of a predetermined operator based on a virtual number user side, wherein the request for acquiring the virtual number comprises real telephone number information of an insurance user; sending the obtained virtual number to the first terminal equipment of the surveyor again, and monitoring the first terminal equipment to monitor voice communication information between the first terminal equipment and second terminal equipment of a corresponding insurance user based on the virtual number; and finally, determining the service quality of the surveyor based on the monitored voice communication information between the first terminal equipment and the second terminal equipment. The service quality of the surveyor can be timely and accurately mastered comprehensively, and the risk of user information leakage can be reduced. Firstly, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain the predetermined operation characteristic data of the plurality of users; then, analyzing the obtained operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the predetermined multiple users; and finally, sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor the abnormal user or check the abnormal user. The abnormal operation of the user can be quickly and accurately identified, and the accuracy of identifying the abnormal user is improved.
In addition, the program for identifying the abnormal operation of the user based on the log data of the present invention may be described by program modules having the same function according to the different functions implemented by each part thereof. Fig. 2 is a schematic diagram showing program modules of a program for identifying abnormal operations of a user based on log data according to an embodiment of the invention. In this embodiment, the program for identifying the abnormal operation of the user based on the log data may be divided into the acquisition module 201, the analysis module 202, and the sending module 203 according to the difference of the functions implemented by each part of the program. As can be seen from the above description, the program module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the program for identifying abnormal operation of the user based on log data in the electronic device 10. The functions or operation steps implemented by the module 201-203 are similar to those described above, and are not described in detail here, for example, where:
the acquisition module 201 is configured to acquire log data of a plurality of predetermined users, and perform statistical analysis on the acquired log data to respectively acquire operation feature data of the plurality of predetermined users;
the analysis module 202 is configured to analyze the obtained operation feature data according to a classification model for identifying abnormal users, which is established in advance, so as to determine abnormal users from the predetermined multiple users;
the sending module 203 is configured to send the determined identity information of the abnormal user to a predetermined abnormal user monitoring center, so as to monitor or check the abnormal user.
In addition, the present invention further provides a method for identifying a user abnormal operation based on log data, and please refer to fig. 3, where the method for identifying a user abnormal operation based on log data includes the following steps:
s100, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to obtain operation characteristic data of the users;
specifically, the collected log data of the user includes data information such as a user name, a login IP, time, an operation event, and parameters of the operation user; since the analysis of the abnormal operation of the user is performed based on the operation characteristics of the user in the log data of the user, the log data of the user needs to be collected, and the operation characteristic data containing the user is obtained from the collected log data, specifically, the operation characteristic data of the user is a plurality of characteristic parameters for identifying or recording the operation behavior of the user, where the operation characteristic parameters may be characteristic parameters obtained by taking the number of times that the determined user performs the operation within a predefined time period (for example, each hour of a determined working day or each hour of a non-working day) as a dimension, counted from the operation characteristic data of the user, or may be characteristic parameters obtained by taking the number of times that the determined operation is performed within a predefined time period (for example, each hour of a determined working day or each hour of a non-working day) as a dimension, the number of IPs used within a predefined time period, etc.
S200, analyzing the acquired operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the predetermined multiple users;
s300, sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor the abnormal user or perform checking processing.
Specifically, in some optional implementations of this embodiment, the process of building the pre-established classification model for the abnormal user recognition includes the following steps:
e2, analyzing the acquired operation characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined multiple users;
specifically, clustering the characteristic data of a plurality of users by adopting a clustering algorithm to obtain a plurality of clusters; when the cluster contains characteristic data which is far away from the whole data center or forms discrete scattered points, determining the user corresponding to the data characteristics far away from the data center or forming the discrete scattered points as an abnormal user.
In this embodiment, the unsupervised learning manner may be a clustering algorithm, for example, a distance-based clustering algorithm.
The operation characteristic data of a plurality of users can be clustered by adopting a clustering algorithm, and the operation characteristic data of the users with high association degree are aggregated to obtain a plurality of clusters. Each cluster may contain operation characteristic data of a plurality of users having a high degree of association. In this embodiment, the distribution of each operation characteristic data in each cluster may be respectively determined, and if only operation characteristic data smaller than a first preset number, for example, 2 scattered points are distributed in one cluster, the operation characteristic data in the cluster is considered to be scattered points, and a user corresponding to the scattered points is an abnormal user; if the operating characteristic data with the quantity larger than the first preset quantity are distributed in one cluster, and most data in the cluster are far away from the central data, if the quantity of the operating characteristic data with the distance from the predefined central data larger than the predefined distance threshold value is larger than or equal to the second preset quantity, the whole cluster is considered as a cluster of an abnormal user; or if the operating characteristic data with the distance from the cluster to the predefined central data being greater than the predefined distance threshold is distributed in one cluster, and the number of the operating characteristic data with the distance from the cluster to the predefined central data being greater than the predefined distance threshold is smaller than the second preset number, the user corresponding to the operating characteristic data with the distance from the cluster to the central data being greater than the predefined distance threshold is considered to be an abnormal user.
F2, based on the determined operation characteristic data of the abnormal user, selecting key characteristic parameters for constructing a classification model from the characteristic parameters of the abnormal user in a supervised learning mode, and generating key characteristic data containing the key characteristic parameters;
specifically, in this embodiment, in order to construct a classification model for identifying an abnormal user, feature data of the abnormal user in the plurality of determined users may be analyzed in a supervised learning manner, and key feature parameters for constructing the classification model, that is, parameters important in identifying the abnormal user, may be selected from the feature parameters.
In this embodiment, the supervised learning approach may employ a decision tree. Before the decision tree is used for selecting key characteristic parameters for constructing a classification model, the decision tree can be constructed by using the determined characteristic data of the abnormal user. The decision tree is trained by taking the feature data of a plurality of abnormal users as training samples, so that the decision tree can learn the importance degree of each feature parameter in the feature data of the abnormal users in identifying the abnormal users. And the decision tree constructed by the determined feature data of the abnormal user comprises a plurality of nodes, each node is used for one feature parameter, and the more important the feature parameters corresponding to the nodes which are closer to the root node of the decision tree are in identifying the abnormal user. The characteristic parameters corresponding to the nodes with the depth larger than the depth threshold value in the decision tree, namely the more important characteristic parameters, can be selected as the key characteristic parameters for constructing the classification model. For example, in this embodiment, taking the number of times that the user performs the operation in different preset time periods included in the feature data of the user as an example, the decision tree constructed by using the feature data of the abnormal user includes nodes corresponding to the number of times that the user performs the operation at each preset time point, and in the decision tree, according to the difference between the number of times that the user performs the operation at each preset time point and the importance degree of identifying the abnormal user, the depth of the nodes corresponding to the number of times that the user performs the operation in different time periods in the decision tree is also different. In this embodiment, after selecting the key feature parameters for constructing the classification model, that is, the operation times of the key time period, through the decision tree, the feature data of the abnormal user meeting the following conditions may be selected from the determined feature data of the abnormal user: and classifying the feature data of the abnormal user by the decision tree to obtain a classification result, namely the abnormal user. Namely, the decision tree is adopted to classify the identified feature data of the abnormal users again to obtain a classification result. When the classification result of the feature data of the abnormal user by the decision tree is the abnormal user, the key feature parameters (namely, the execution times of the key time points) in the feature data of the abnormal user can be combined to obtain the key feature data, so that the classification model is constructed by using the key feature data.
In this embodiment, a naive bayes algorithm may also be employed in the supervised learning manner. The method can adopt a naive Bayes algorithm to respectively calculate the abnormal probability corresponding to each characteristic parameter according to the determined characteristic data of the abnormal user, wherein the abnormal probability corresponding to the characteristic parameter is the probability that the user is the abnormal user when the value of the characteristic parameter is abnormal. The anomaly probability may represent how important the feature parameter is in identifying anomalous users. The feature parameters corresponding to higher probability of abnormality are more important for identifying abnormality. After the abnormal probabilities corresponding to the characteristic parameters are respectively calculated through a naive Bayes algorithm, the characteristic parameters with the corresponding abnormal probabilities larger than a probability threshold value can be used as key characteristic parameters for constructing a classification model. In this embodiment, after selecting the key feature parameters for constructing the classification model by the naive bayes algorithm, the feature data of the abnormal user meeting the following conditions can be selected from the determined feature data of the abnormal user: and classifying the feature data of the abnormal user by adopting a naive Bayes algorithm to obtain a classification result, wherein the classification result is the abnormal user. Namely, the characteristic data of the identified abnormal users are classified again by adopting a naive Bayes algorithm to obtain a classification result. When the classification result of the feature data of the abnormal user is the abnormal user through the naive Bayes algorithm, key feature parameters in the feature data of the abnormal user can be combined to obtain key feature data, so that a classification model can be constructed by using the key feature data.
It should be noted that, in this embodiment, the key feature data is the execution times of the key time period, and in other embodiments, the key feature data may also be the number of IPs used in the key time period and the number of times of logging in the operating system; but also basic information of the user, such as age, academic calendar, occupation, etc., and in this embodiment, this is not a limitation.
G2, constructing a decision tree model by using the key feature data, wherein the decision tree model is a classification model for identifying the abnormal user.
Specifically, in the present embodiment, the classification model may be a decision tree model. A decision tree model may be created, and the generated key feature data including the key feature parameters is used as a training sample to train the decision tree model to obtain a trained classification model for identifying the abnormal user.
As can be seen from the above embodiments, the method for identifying abnormal operation of user based on log data,
firstly, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain the predetermined operation characteristic data of the plurality of users; then, analyzing the acquired operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the plurality of pre-determined users; and finally, sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor the abnormal user or check the abnormal user. The abnormal operation of the user can be quickly and accurately identified, and the accuracy of identifying the abnormal user is improved.
Furthermore, the present invention also provides a computer-readable storage medium, on which a program for identifying a user abnormal operation based on log data is stored, and when executed by a processor, the program for identifying a user abnormal operation based on log data implements the following operations:
collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain the predetermined operation characteristic data of the plurality of users;
analyzing the obtained operation characteristic data according to a pre-established classification model for identifying abnormal users so as to determine the abnormal users from the predetermined multiple users;
and sending the determined identity identification information of the abnormal user to a predetermined abnormal user monitoring center so as to monitor the abnormal user or perform verification processing.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the electronic apparatus and the method for identifying abnormal operations of the user based on the log data, and will not be described herein again.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (6)
1. An electronic device, comprising a memory, and a processor connected to the memory, wherein the processor is configured to execute a program stored in the memory for identifying a user abnormal operation based on log data, and when the program for identifying a user abnormal operation based on log data is executed by the processor, the method comprises the following steps:
a1, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain predetermined operation characteristic data of the plurality of users, wherein the operation characteristic data of the users are a plurality of characteristic parameters for identifying or recording user operation behaviors;
a2, analyzing the acquired operation characteristic data according to a classification model of the pre-established abnormal user identification to determine an abnormal user from the plurality of pre-determined users;
a3, sending the identity identification information of the abnormal user to a predetermined abnormal user monitoring center for monitoring the abnormal user or checking;
in the a2, the process of building the pre-built abnormal user identification classification model includes the following steps:
analyzing the acquired operation characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users;
based on the determined operation characteristic data of the abnormal user, selecting key characteristic parameters for constructing a classification model from a plurality of characteristic parameters of the abnormal user in a supervised learning mode, and generating key characteristic data containing the key characteristic parameters;
constructing a decision tree model by using the key characteristic data, wherein the decision tree model is a classification model for identifying abnormal users;
the step of analyzing the acquired operational characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users comprises:
clustering the operation characteristic data of a plurality of users, and aggregating the operation characteristic data of the users with high association degree to obtain a plurality of clusters;
respectively judging the distribution of each operation characteristic data in each cluster, and if the operation characteristic data contained in one cluster is less than a first preset quantity, considering the user in the cluster as an abnormal user;
if the operation characteristic data contained in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is greater than or equal to the second preset number, the user in the cluster is considered as an abnormal user;
or if the operation characteristic data contained in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is less than the second preset number, determining that the user corresponding to the operation characteristic data with the distance from the central data greater than the predefined distance threshold in the cluster is an abnormal user.
2. The electronic device of claim 1, wherein the supervised learning approach is a decision tree algorithm or a naive bayes algorithm.
3. The electronic device according to claim 1 or 2, wherein the operation characteristic data comprises a user name, a login IP, a time, an operation event, parameter data information of an operation user.
4. A method for identifying abnormal operation of a user based on log data, the method comprising the steps of:
s1, collecting predetermined log data of a plurality of users, and performing statistical analysis on the collected log data to respectively obtain the predetermined operation characteristic data of the plurality of users, wherein the operation characteristic data of the users are a plurality of characteristic parameters for identifying or recording user operation behaviors;
s2, analyzing the acquired operation characteristic data according to a pre-established classification model for identifying abnormal users to determine the abnormal users from the predetermined multiple users;
s3, sending the identity identification information of the abnormal user to a predetermined abnormal user monitoring center for monitoring the abnormal user or checking;
in S2, the process of building the pre-built abnormal user identification classification model includes the following steps:
analyzing the acquired operation characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users;
based on the determined operation characteristic data of the abnormal user, selecting key characteristic parameters for constructing a classification model from a plurality of characteristic parameters of the abnormal user in a supervised learning mode, and generating key characteristic data containing the key characteristic parameters;
constructing a decision tree model by using the key characteristic data, wherein the decision tree model is a classification model for identifying abnormal users;
the step of analyzing the acquired operational characteristic data according to an unsupervised machine learning algorithm to determine abnormal users from the predetermined plurality of users comprises:
clustering the operation characteristic data of a plurality of users, and aggregating the operation characteristic data of the users with high association degree to obtain a plurality of clusters;
respectively judging the distribution of each operation characteristic data in each cluster, and if the operation characteristic data contained in one cluster is less than a first preset quantity, considering the user in the cluster as an abnormal user;
if the operation characteristic data contained in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is greater than or equal to the second preset number, the user in the cluster is considered as an abnormal user;
or if the operation characteristic data included in one cluster is greater than or equal to the first preset number, and the number of the operation characteristic data with the distance from the predefined central data greater than the predefined distance threshold is less than the second preset number, the user corresponding to the operation characteristic data with the distance from the central data greater than the predefined distance threshold in the cluster is considered as an abnormal user.
5. The method of claim 4, wherein the supervised learning approach is a decision tree algorithm or a naive Bayes algorithm.
6. A computer-readable storage medium storing a program for identifying a user abnormal operation based on log data, the program being executable by at least one processor to cause the at least one processor to perform the steps of the method for identifying a user abnormal operation based on log data as claimed in claim 4 or 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910065654.5A CN109918279B (en) | 2019-01-24 | 2019-01-24 | Electronic device, method for identifying abnormal operation of user based on log data and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910065654.5A CN109918279B (en) | 2019-01-24 | 2019-01-24 | Electronic device, method for identifying abnormal operation of user based on log data and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109918279A CN109918279A (en) | 2019-06-21 |
CN109918279B true CN109918279B (en) | 2022-09-27 |
Family
ID=66960644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910065654.5A Active CN109918279B (en) | 2019-01-24 | 2019-01-24 | Electronic device, method for identifying abnormal operation of user based on log data and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109918279B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427971A (en) * | 2019-07-05 | 2019-11-08 | 五八有限公司 | Recognition methods, device, server and the storage medium of user and IP |
CN111259985B (en) * | 2020-02-19 | 2023-06-30 | 腾讯云计算(长沙)有限责任公司 | Classification model training method and device based on business safety and storage medium |
SG10202001528TA (en) * | 2020-02-20 | 2020-07-29 | Alipay Labs Singapore Pte Ltd | Methods and systems for identity proofing |
CN111444534A (en) * | 2020-03-12 | 2020-07-24 | 中国建设银行股份有限公司 | Method, device, equipment and computer readable medium for monitoring user operation |
CN113765850B (en) * | 2020-06-03 | 2023-08-15 | 中国移动通信集团重庆有限公司 | Internet of things abnormality detection method and device, computing equipment and computer storage medium |
CN111913860B (en) * | 2020-07-15 | 2024-02-27 | 中国民航信息网络股份有限公司 | Operation behavior analysis method and device |
CN112837061B (en) * | 2021-02-26 | 2024-06-28 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN115688024B (en) * | 2022-09-27 | 2023-05-30 | 哈尔滨工程大学 | Network abnormal user prediction method based on user content characteristics and behavior characteristics |
CN115941265B (en) * | 2022-11-01 | 2023-10-03 | 南京鼎山信息科技有限公司 | Big data attack processing method and system applied to cloud service |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106304085A (en) * | 2016-08-15 | 2017-01-04 | 成都九鼎瑞信科技股份有限公司 | Information processing method and device |
CN107135195A (en) * | 2017-02-20 | 2017-09-05 | 平安科技(深圳)有限公司 | The detection method and device of abnormal user account |
CN107809331A (en) * | 2017-10-25 | 2018-03-16 | 北京京东尚科信息技术有限公司 | The method and apparatus for identifying abnormal flow |
CN108108743A (en) * | 2016-11-24 | 2018-06-01 | 百度在线网络技术(北京)有限公司 | Abnormal user recognition methods and the device for identifying abnormal user |
US10095774B1 (en) * | 2017-05-12 | 2018-10-09 | International Business Machines Corporation | Cluster evaluation in unsupervised learning of continuous data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292528A (en) * | 2017-06-30 | 2017-10-24 | 阿里巴巴集团控股有限公司 | Vehicle insurance Risk Forecast Method, device and server |
-
2019
- 2019-01-24 CN CN201910065654.5A patent/CN109918279B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106304085A (en) * | 2016-08-15 | 2017-01-04 | 成都九鼎瑞信科技股份有限公司 | Information processing method and device |
CN108108743A (en) * | 2016-11-24 | 2018-06-01 | 百度在线网络技术(北京)有限公司 | Abnormal user recognition methods and the device for identifying abnormal user |
CN107135195A (en) * | 2017-02-20 | 2017-09-05 | 平安科技(深圳)有限公司 | The detection method and device of abnormal user account |
US10095774B1 (en) * | 2017-05-12 | 2018-10-09 | International Business Machines Corporation | Cluster evaluation in unsupervised learning of continuous data |
CN107809331A (en) * | 2017-10-25 | 2018-03-16 | 北京京东尚科信息技术有限公司 | The method and apparatus for identifying abnormal flow |
Also Published As
Publication number | Publication date |
---|---|
CN109918279A (en) | 2019-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109918279B (en) | Electronic device, method for identifying abnormal operation of user based on log data and storage medium | |
CN110321371B (en) | Log data anomaly detection method, device, terminal and medium | |
US7801703B2 (en) | Self-learning integrity management system and related methods | |
CN108182515B (en) | Intelligent rule engine rule output method, equipment and computer readable storage medium | |
CN111475804A (en) | Alarm prediction method and system | |
CN112052111B (en) | Processing method, device and equipment for server abnormity early warning and storage medium | |
US10101244B2 (en) | Self-learning simulation environments | |
CN109543891B (en) | Method and apparatus for establishing capacity prediction model, and computer-readable storage medium | |
CN111694718A (en) | Method and device for identifying abnormal behavior of intranet user, computer equipment and readable storage medium | |
US20100153330A1 (en) | Proactive Information Technology Infrastructure Management | |
CN113159615A (en) | Intelligent information security risk measuring system and method for industrial control system | |
CN107341095B (en) | Method and device for intelligently analyzing log data | |
CN112751711B (en) | Alarm information processing method and device, storage medium and electronic equipment | |
CN115081997B (en) | Equipment spare part inventory diagnostic system | |
CN113505044B (en) | Database warning method, device, equipment and storage medium | |
US20160259869A1 (en) | Self-learning simulation environments | |
CN111062642A (en) | Method and device for identifying industrial risk degree of object and electronic equipment | |
CN111522859A (en) | Alarm analysis method and device, computer equipment and storage medium | |
CN113282920A (en) | Log abnormity detection method and device, computer equipment and storage medium | |
CN112612679A (en) | System running state monitoring method and device, computer equipment and storage medium | |
CN112801145A (en) | Safety monitoring method and device, computer equipment and storage medium | |
CN109902486A (en) | Electronic device, abnormal user processing strategie Intelligent Decision-making Method and storage medium | |
CN111555899A (en) | Alarm rule configuration method, equipment state monitoring method, device and storage medium | |
CN116841829A (en) | Mobile terminal application program performance monitoring method | |
CN115189961A (en) | Fault identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |