CN114398966A - Early warning method for user portrait based on fortress machine - Google Patents

Early warning method for user portrait based on fortress machine Download PDF

Info

Publication number
CN114398966A
CN114398966A CN202111668530.XA CN202111668530A CN114398966A CN 114398966 A CN114398966 A CN 114398966A CN 202111668530 A CN202111668530 A CN 202111668530A CN 114398966 A CN114398966 A CN 114398966A
Authority
CN
China
Prior art keywords
user
information
abnormal
algorithm
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111668530.XA
Other languages
Chinese (zh)
Inventor
王小涛
王颜康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiuan Century Technology Co ltd
Original Assignee
Beijing Jiuan Century Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiuan Century Technology Co ltd filed Critical Beijing Jiuan Century Technology Co ltd
Priority to CN202111668530.XA priority Critical patent/CN114398966A/en
Publication of CN114398966A publication Critical patent/CN114398966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data security and processing, and discloses a method for early warning a user portrait based on a bastion machine, which comprises the following steps: acquiring a session log of the bastion machine, acquiring information of a user according to the session log of the bastion machine and classifying the information into six-element group models; establishing a zero trust architecture model and a threat library, classifying information of each user in a hexahydric model according to a machine learning algorithm and the zero trust architecture model, and then carrying out anomaly detection to identify an abnormal user; when the abnormal user is judged to appear, the bastion machine carries out early warning on the abnormal user and carries out secondary verification on the abnormal user, after the secondary verification is error-free, dangerous operation of the abnormal user is stored in the threat library, and the safety scoring module carries out safety scoring mapping on the abnormal user. The invention is used for carrying out automatic safety early warning on the user through the bastion machine, predicting various behaviors involved after the user logs in, finding potential safety hazards in the behaviors and carrying out early warning on safety auditors in time.

Description

Early warning method for user portrait based on fortress machine
Technical Field
The invention relates to the technical field of data security and processing, in particular to a method for early warning a user portrait based on a bastion machine, which is used for carrying out automatic security early warning on a system user through a bastion machine log, predicting various behaviors related to the user after logging in, finding potential safety hazards in the behaviors, and timely reminding and early warning safety auditors.
Background
With the progress of science and technology, the country pays more and more attention to information safety, and the bastion machine plays a good role in supervision in communication operation and maintenance. The user can supervise each item operation of general operation and maintenance managers through the fortress machine, and safety problems such as information leakage caused by disordered password management are reduced.
However, the bastion machine on the market is generally used in a company intranet, a monitoring mode for logging in a user only stays at a mode that a user password is correct correspondingly, namely, the user safety is judged, a manual auditing mode is mostly adopted for various behavior logs in use of the user, real-time blocking and early warning processing can be performed only when extremely high-risk operations are faced, and judgment can be performed only by manpower for some low-risk operations and some operations which may cause potential safety hazards, so that when behavior log files of a large batch of users are faced, auditors are difficult to distinguish and a large amount of time is consumed. The behavior log of the user contains various information, and besides traditional various login information, the behavior log also comprises various operations of the user in the use process, the operations are different due to different users, and the operation result of the user cannot be accurately obtained simply by using a specific processing mode. In addition, most of the existing bastion machines limit the abnormity of the user through simple abnormity of a login place or input of a high-risk command, and the mode of the account number and the password of the user is based on most of the situations to play a role in safety, so that some potential safety hazards easily exist in the mode.
Therefore, in order to solve the problems of the existing bastion machine, the invention provides a technical method, which kills the potential safety hazard in the initial stage by constructing a dynamic safety detection mechanism; by introducing a zero trust architecture, various behaviors related to a user after logging in are predicted, potential safety hazards in the behaviors are found, an early warning mechanism in the conventional bastion machine is improved, and safety auditors are timely reminded and early warned.
Disclosure of Invention
The invention aims to provide a method for early warning a user portrait based on a fortress machine, which predicts various behaviors related to the user after logging in and finds potential safety hazards in the behaviors by introducing a zero trust architecture, improves an early warning mechanism in the prior fortress machine, and timely reminds and early warns safety auditors.
The invention is realized by the following technical scheme: a method for early warning of a user portrait based on a fortress machine comprises the following steps:
s1, acquiring a session log of the bastion machine, acquiring information of a user according to the session log of the bastion machine and classifying the information of the user into a six-element group model;
s2, establishing a zero trust architecture model and a threat library, classifying information of each user in a hexahydric group model according to a machine learning algorithm and the zero trust architecture model, and then carrying out anomaly detection to identify an abnormal user; each user comprises a single user and a group user;
s3, when the abnormal user is judged to appear, the bastion machine carries out early warning on the abnormal user and carries out secondary verification on the abnormal user, and after the secondary verification is error-free, dangerous operation of the abnormal user is stored in a threat library to be used as a basis for judging the abnormal user next time; and the security scoring module is used for mapping the security scoring of the abnormal users.
The technical scheme mainly aims to achieve the effect of forming user figures on employees through the log information of the fortress machine and further achieving the effect of monitoring safety early warning in real time. Firstly, a 'zero trust' architecture is introduced, namely, never trust and continuous verification. For a common bastion machine, after the employee passes login authentication, the system can default that the employee is in a safe state, and only when a certain high-risk instruction is carried out, the bastion machine can adopt a blocking mechanism. The invention aims to continuously verify a user who successfully logs in, monitor various data of a background, continuously supervise the user through various operations of the user, describe the safety coefficient of the user by using various different machine learning algorithms, respectively analyze different data of the user and perform safety early warning.
In order to better implement the present invention, step S1 further includes:
acquiring a conversation log of the bastion machine according to a general interface, acquiring information of a user according to the conversation log of the bastion machine and classifying the information of the user into a six-element group model;
the information of the users in the six-element group model is classified into time information, place information, person/ID information, scope information, action information and result information;
and performing combined analysis on the users among different tuples in the six-tuple model.
In order to better implement the present invention, the method for analyzing the single user and the group user in step S2 further includes:
analyzing the behaviors of the single user and the group user in a time sequence and a place domain from multiple dimensions, establishing a group baseline and an individual baseline according to the analyzed data of characterization and correlation analysis of the behaviors of the single user and the group user, comparing the individual behaviors with the group behaviors according to the average value, the variance and the similarity, and identifying abnormal behaviors.
In order to better implement the present invention, step S2 further includes:
carrying out data classification processing according to a machine learning algorithm;
setting an initial judgment standard in the bastion machine for the user who is used for the first time, wherein the initial judgment standard comprises login abnormal conditions and conditions that user access and operation exceed user rights, classifying the information of the user according to the hexahydric group model, judging the information of the user who is used for the first time according to the initial judgment standard, and considering the user who does not meet the judgment standard as the user with low safety factor;
and for the user using the fortress for at least one month, carrying out anomaly detection on the user according to the session log information of the fortress machine, the six-element group model and a machine learning algorithm, setting various information characteristics of the user in the six-element group model as X, wherein the machine learning algorithm comprises an SVM algorithm, an isolated forest algorithm and a clustering algorithm.
In the technical scheme, various information of a user is subjected to data processing through various machine learning algorithms, an SVM and an isolated forest algorithm are used in a simpler condition, and a clustering algorithm is used in a complex condition such as instruction operation; for comprehensive judgment of various information, users are grouped by using a k-means clustering algorithm, specific algorithms such as a word2vec algorithm and a neural network algorithm are used, and the neural network algorithm can be used for processing when the word2vec algorithm is used.
In order to better implement the present invention, further, the method using the SVM algorithm includes:
for time information and place information of a user, analyzing the time information and the place information of the user in advance by using an One-Class SVM algorithm, setting a user feature X1, … and xn of a normal behavior as an n-dimensional array of user features, wherein the distance from the X to a center o is smaller than r, and solving a minimum sphere meeting the condition through the given time information and place information of the user;
according to
Figure BDA0003448859690000031
And xi-o||≤r+ξiJudging whether a user is abnormal, wherein V represents the volume of a sphere with the radius r, xi is a relaxation variable introduced by the optimization problem, and C is a penalty parameter;
r and the center o are obtained from the user characteristics x1, …, xn of given C and normal behaviors, and it is judged whether or not it is an abnormal point by judging whether or not it is in this sphere, and if it is an abnormal point, it is judged that the time information and the place information for a given user are abnormal.
In the technical scheme, for the relevant information of the user time and the login place, the judgment is usually carried out only by using an SVM (support vector machine) or an isolated forest algorithm, even if the time and the place are correlated.
To better implement the present invention, further, the method of using the isolated forest algorithm comprises:
a, analyzing the combination of the time information and the place information of the user by using an isolated forest algorithm;
randomly selecting psi point sample points from the characteristic data of the user as a sample subset, and putting the psi point sample points into a root node of the tree;
randomly assigning a dimension, and randomly generating a cutting time point p in the current node data when the assigned dimension is the user login time, wherein the cutting time point is generated between the earliest time and the latest time in the current node data;
d, generating a limit by using the cutting point, dividing the current node into 2 parts according to the login time, placing the data with the login time earlier than the point p on the left child node of the current node, and placing the data more than or equal to the point p on the right child node of the current node;
e, recursion of the step c and the step d in the child nodes, and continuously constructing new child nodes until only one piece of data in the child nodes can not be cut continuously or the child nodes reach the limited height;
f, circulating the steps b to e until t isolated trees iTrees are generated;
step g, according to the obtained data, traversing each isolated tree for each data point X to obtain the average height h (X) of the data point X in the forest, so that the abnormal score of the obtained data is calculated as shown in the following formula:
Figure BDA0003448859690000041
Figure BDA0003448859690000042
where H (i) is a harmonic number, which can be estimated by ln (i) and the Euler constant, X is the specific coordinate of the data point X, c is the average of the path lengths calculated for a given sample, E (h (X)) is the average of h (X);
and h, judging whether the user is abnormal or not according to the obtained abnormal score.
In the technical scheme, for the relevant information of the user time and the login place, the judgment is usually carried out only by using an SVM (support vector machine) or an isolated forest algorithm, even if the time and the place are correlated.
To better implement the present invention, further, the method using a clustering algorithm includes:
performing primary analysis on instruction operation data in the user action information and access service data in the user scope information by using a K-means algorithm in a clustering algorithm;
secondly, performing secondary analysis by performing outlier calculation on various information of the users in the six-element group model by using a DBSCAN clustering algorithm in the clustering algorithm;
in the process of using a K-means algorithm and a DBSCAN clustering algorithm, a word2vec algorithm is introduced, and the semantics of user instruction operation is mapped to a multi-dimensional vector space for simplified analysis;
and for the combination of different tuples in the six-tuple model, in the process of performing primary analysis and secondary analysis by using a K-means algorithm and a DBSCAN clustering algorithm, combining a word2vec algorithm with a neural network algorithm to perform combination analysis on users between different tuples.
In the technical scheme, under the condition of user instruction operation, various aspects of the instruction operation are generally analyzed by using a k-means and DBSCAN clustering algorithm, a word2vec algorithm can also be introduced, semantics are mapped to a multi-dimensional vector space so as to carry out simplified analysis, and if more complicated conditions occur, such as instruction operation, access service, time, IP and other information are all collected and judged, in addition to the simplification and classification of various data by using the clustering algorithm, analysis can also be carried out by neural network learning.
In order to better implement the present invention, the security scoring module in step S3 further includes:
establishing a security scoring module in the bastion machine according to a zero trust architecture model and a machine learning algorithm;
the safety scoring module carries out safety scoring on the user every day to obtain dynamic behavior information of the user, wherein the dynamic behavior information of the user comprises all behavior information of the user from now to now;
the user is scored safely every day, the safety score is lower than a safety score threshold value to cause an alarm of the bastion machine, and the safety score is determined after manual confirmation;
and accumulating the daily security scores of the users to the next day, and mapping the security scores according to a security mapping function, wherein the users lower than the security score threshold are determined to have potential safety hazards.
According to the technical scheme, the user is scored safely every day, when the safety score is lower than a certain threshold value, a system alarm is caused, and the score is determined after manual confirmation; daily safety scores will accumulate to the next day, dangerous scores will generally map to normal but scores will be lower than others, and continued abnormal operation may cause the user to be more easily alerted.
In order to better implement the present invention, further, the method for performing security score mapping according to the security mapping function includes:
setting a safety score full score F, a good score G and a passing score H, wherein users higher than the good score G can map to the full score F quickly the next day, users lower than the good score G can map to the G slowly, all users can map to the vicinity of G one day, and users lower than the good score H can map to the vicinity of (G + H)/2 under the condition of solving the problem by default;
the mapping process may be modified by demand for a certain amount of time, i.e., a user below G and above H may need to cross the good score line several days, depending on how many days the previous error in the security setting requirement affected the system alertness.
In the technical scheme, the scores of users without abnormal occurrence are mapped to the full score on the next day, but the users reaching the good score on the previous day are mapped to the full score attachment within a certain time, but the users lower than the good score do not return to the full score quickly; users with the excellent scores below are determined to have potential safety hazards, namely the system has certain special attention, and the users can perform potential safety hazard operation for a period of time to cause system early warning.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) according to the invention, through historical log information, a user portrait is formed for a bastion machine user, and various abnormal operations of the user are effectively analyzed by combining a historical threat library, so that early warning of security threats is achieved;
(2) the invention uses various machine learning algorithms, aims to achieve the automation of safety early warning and reduce the work of operation and maintenance auditors.
Drawings
The invention is further described in connection with the following figures and examples, all of which are intended to be open ended and within the scope of the invention.
Fig. 1 is a flow chart of a security early warning for user activities through a user log based on a bastion machine in the early warning method for a user portrait based on the bastion machine provided by the invention.
Fig. 2 is a schematic structural diagram of processing different algorithms for different data when processing user log data based on the bastion machine in the method for early warning a user portrait based on the bastion machine provided by the invention.
Fig. 3 is a schematic structural diagram of a security scoring module obtained based on an algorithm and a time-based security scoring attenuation mode in the method for early warning a user portrait based on a bastion machine provided by the invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and therefore should not be considered as a limitation to the scope of protection. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1:
according to the embodiment, as shown in fig. 1, first, session log data is obtained through a general interface, and basic information and authority of each account are obtained; secondly, various information of the user needs to be acquired through the bastion machine logs respectively and classified and analyzed, and the six-element model mainly comprises time, place, person/ID, scope, action and result, specifically, the service using time, IP address, user name, which services are accessed, which operation (including instruction operation and mouse operation) is performed, and which result is generated; secondly, the system can give an early warning and notify an administrator to the users with abnormal behaviors. The administrator can confirm with the user to judge whether the operation is wrong or the account is stolen, or the system judges the operation is wrong. Once verified, dangerous operation of the system can enter a threat library, and corresponding problem description can be entered at the same time, so that reference is provided for subsequent safety early warning. Meanwhile, for users with abnormal conditions, the safety factor (abnormal score) of the users can be reduced to a certain degree in the safety scoring module, the attention of the system to the abnormal operation of the users can be improved, and therefore the users can effectively and timely give an early warning to the threatening operation of the users. In addition, for the user with the problem, the bastion machine system analyzes according to the historical library to judge which part the user wrongly appears, and whether the user has similar problems before. Finally, after long-time threat library accumulation, various abnormal conditions in the threat library are comprehensively analyzed, and the subsequent dangerous operation can be predicted through the early abnormal operation of the user, so that advanced crisis early warning is achieved, and precaution is further achieved.
The threat library in this embodiment stores abnormal behavior information of various users for later judgment of user behavior information, and the judgment rule of whether the user behavior is safe in step S3 in this embodiment is dynamic, that is, the past behavior of this user is obtained and compared with other users and with behavior information in the threat library; and judging whether the user is abnormal or not, wherein the abnormal user behavior is determined and then enters a threat library to be used as the basis for judging the next abnormal behavior.
The embodiment also describes the processing of the user abnormality, and when the user abnormality causes an alarm, the operation and maintenance personnel will manually confirm with the user to determine that the user operation is wrong or the user account number is in problem. For a faulty operation, in addition to the responsibility for the operation, the account is also paid a great deal of attention. And for account problems, threats are put in storage, and the hexahydric group model related to each abnormal log is analyzed. In addition to the time IP address, analysis needs to be performed in connection with multiple angles in order to operate instructions and access services. Various operations before the occurrence of the abnormality are analyzed in detail, which are different from the daily operations and which are different from the common operations of the group. Here the instruction based operation will mainly use a clustering algorithm. And carrying out certain analysis on the overall operation of the user through a clustering algorithm, such as the use frequency of various instructions, the time arrangement of the use instructions and the like. And then attempt to analyze the anomalies therein by comparing historical user actions. There are specific operations that will be heavily documented, which in turn facilitates the feedback of the anomaly detection of the commanded operation. For example, when the user performs some early operations similar to the recording in the threat library, the behavior can be immediately warned, and the subsequent threat operation is prevented. And for the users with problems, the users find out whether similar problems exist or not by comparing the threat libraries, and propose a solution or add a new threat library.
Example 2:
in this embodiment, a plurality of data indexes may occur in different tuples, for example, the time may include user login and logout time, online time, and the like; the IP address comprises a login place, and the login time of different places can be analyzed by combining with the time; the scope comprises the most access service, the least non-zero access service, the unauthorized access service and the like; the actions are more abundant in various aspects including input instructions and mouse instructions, common instructions, whether high-risk instructions exist or not, and the like. In addition, the pair between different tuples can be subjected to combined analysis.
Other parts of this embodiment are the same as embodiment 1, and thus are not described again.
Example 3:
in this embodiment, each user is monitored for abnormalities, wherein individual behaviors are analyzed in a time series and a location domain from multiple dimensions, not only a single user but also a group user behavior is analyzed, a group baseline and an individual baseline are established based on data of behavior characterization and correlation analysis, and the individual and group behaviors are compared by means of an average value, a variance, a similarity and the like, so as to identify abnormal behaviors.
The rest of this embodiment is the same as embodiment 1 or 2, and therefore, the description thereof is omitted.
Example 4:
this embodiment is further optimized based on any of the above embodiments 1-3, as shown in fig. 2, in this embodiment, for the user who uses for the first time, the system has some initial criteria, such as various exceptions of login, and user access and operation exceed the user authority, which are all considered as low safety factors by the system. For users who use for a long time, generally, the system can detect the abnormity by combining the log information with the six-element group model, wherein the algorithm comprises an SVM, an isolated forest, a clustering algorithm and the like. Respectively carrying out anomaly analysis on the time scale of the user, namely whether the operation is similar to the previous operation or not; meanwhile, each user is compared transversely, and the behavior analysis of group users is needed.
Other parts of this embodiment are the same as any of embodiments 1 to 3, and thus are not described again.
Example 5:
in this embodiment, as shown in fig. 2, for the user data of the user time and the IP address, the present invention usually uses an SVM algorithm to perform pre-analysis, and generally adopts an One-Class SVM algorithm, where X is an n-dimensional array of user features, that is, a user feature (X1, …, xn) of a normal behavior is set, and the distance from X to the center o is smaller than r, and the minimum sphere meeting this condition is obtained through the given user actual data.
Namely the following optimization formula:
Figure BDA0003448859690000081
||xi-o||≤r+ξi
wherein V represents the volume of a sphere with the radius r, ξ is a relaxation variable introduced by the optimization problem, and C is a penalty parameter;
given C, r and the center o can be obtained from the optimization problem and the normal user data x, while for a new user point, it can be determined whether it is an outlier by determining whether it is in this sphere.
The SVM algorithm is mainly used for simple abnormity judgment, such as judgment on whether the login duration and the IP address are common places or not; meanwhile, the method is mainly used for training in advance to obtain a scale and then monitoring abnormal points.
The algorithm used here is generally suitable for monitoring rather than scoring outliers, but it is convenient to be able to quickly perform good anomaly detection based on the currently given data. The method is generally used for user detection when the data volume is small.
In this embodiment, an example of an abnormality determination method implemented by a user is given based on an abnormality occurring in a time and an address of the user. There is a preset value, i.e., working time and company IP address, for a user login and logout time and use of IP address. There will be an exception score for users who do not comply with the above rules. Such anomalous scores do not directly cause the system to alert the user, but increase the monitoring of the user. If the number of maintenance personnel is large, users can be grouped firstly through a clustering algorithm and then analyzed through an isolated forest algorithm. SVM algorithms are typically used here. For example, user login time, online time, login IP and the like can form a user characteristic X, abnormal data can be found out by using an One-Class SVM algorithm on data of a plurality of users, and the data which do not log in at work time can be roughly guessed to be marked. Meanwhile, if a single user who uses for a long time is faced, the abnormal login time and login IP of the user can be detected through an isolated forest algorithm, and a corresponding abnormal score is obtained. Here the anomaly detection for time and IP address will be detected from the individual's historical usage record and department owner usage record, respectively. For the login time of a single user every day, the user can be easily distinguished from the working time of a weekday, and for some specific maintenance personnel, extra working time can exist, so that the situation cannot be transversely compared among users independently. If the number of maintenance personnel is large, users can be grouped firstly through a clustering algorithm and then analyzed through an isolated forest algorithm. If the user belongs to a plurality of users logged in by the same IP or a plurality of long-distance IP logs in by the same user in a short time, abnormal scoring of the condition can directly give an abnormal alarm to remind operation and maintenance auditors. The above abnormal score may decay with time, which is mentioned in the safety scoring standard, but a situation that the score exceeds the threshold due to superposition of multiple abnormalities may also occur, and at this time, the operation and maintenance auditor needs to be notified.
The other parts of this embodiment are the same as those of embodiment 4, and thus are not described again.
Example 6:
in this embodiment, as shown in fig. 2, for the case where the time IP address is more complex, an isolated forest algorithm is also used, and this algorithm is mainly used to perform anomaly detection on the existing data, that is, to search for outliers that are easy to be isolated. Such an algorithm can instantly find out from the user data whether outliers exist, especially for the lateral comparison between the user and the user. For data sets formed by characteristic values of different users, a binary tree is continuously constructed by randomly selecting boundaries, for example, the boundary of time, the login time is before 12 points, and similar cutting is carried out until only one record or the height of a number reaches the set boundary finally. With such an algorithm, the height of the tree in which the user data X is located can be determined approximately, and the degree of abnormality can be determined from the height. Using this algorithm, we can more easily analyze the user's time and IP related data, find the outliers in it and give a score. Similar to the One-Class SVM algorithm described in embodiment 5, the present invention uses an isolated forest algorithm to detect the user time and IP address, or combines the time and IP address to form a multivariate vector, for example, the login time, login duration, and login IP form a ternary array, and then detects the outlier in the multivariate vector through the isolated forest.
The detailed procedure using the isolated forest algorithm is as follows:
(1) and randomly selecting psi point sample points from the characteristic data of the user as a sample subset to be placed in the root node of the tree.
(2) Randomly assigning a dimension such as login time, and randomly generating a cutting time point p in the current node data (the cutting point is generated between the earliest time and the latest time in the current node data).
(3) With this cut point, a boundary is generated, and then the current node is divided into 2 parts according to the login time: and placing the data with the login time earlier than p point at the left child node of the current node, and placing the data with the login time more than or equal to p at the right child node of the current node.
(4) Recursion steps (2) and (3) in the child node, new child nodes are continuously constructed until only one data in the child node (cutting can not be continued) or the child node reaches a defined height.
(5) And (4) circulating from (1) to (4) until t isolated trees iTree are generated.
(6) According to the obtained data, each data point X can be traversed through each isolated tree to obtain the average height h (X) of the data point X in the forest, so that the abnormal score of the obtained data is calculated as shown in the following formula,
Figure BDA0003448859690000101
among them are mentioned that there are,
Figure BDA0003448859690000102
h (i) is a harmonic number, which can be estimated by ln (i) +0.5772156649 (euler constant). Smaller scores indicate more anomalies in the data, X is the specific coordinate of data point X, c is the average of the path lengths for a given sample, E (h (X)) means the average (expectation) of h (X), E (X) statistically indicates the expectation within brackets, h (X) is the height of X per tree, c (Ψ) is the average of the path lengths for a given number of samples Ψ, and is used to normalize the path length h (X) of sample X.
The other parts of this embodiment are the same as those of embodiment 4, and thus are not described again.
Example 7:
in this embodiment, as shown in fig. 2, for more complicated situations, such as instruction operation of a user, access to a service, or a data set combining multiple user features, a clustering algorithm, such as a K-means algorithm, is used to select different grouping numbers K, cluster the obtained user data X, that is, calculate all 1 to 20 clustering centers), and then draw an Elbow curve, that is, a function image of the distance and the grouping number K. By observing the Elbow curve, the position where obvious fracture occurs, namely the maximum position of the absolute value of the slope change rate of the curve can be obtained and is the optimal grouping number. We thus get K0 user groupings representing various categories. By analyzing these users separately, we can detect whether there are abnormal users by using various ways. The clustering algorithm is continuously used to find whether there is an outlier or other specific methods are selected. When different user characteristics are analyzed transversely, if data are too cluttered, users can be grouped first by using a K-means algorithm.
When the special clustering algorithm is used for continuous analysis, a DBSCAN clustering algorithm can be adopted, and the method is a spatial clustering algorithm based on density. Visually, we can consider that the system randomly selects one of the sample points, draws a circle around the selected sample point, specifies the radius of the circle and the sample points that are least contained in the circle, if there are enough sample points in the specified radius, the center of the circle is shifted to the inner sample point, continues to go to other sample points near the circle, and continues to develop a lower line like a promotion. And stopping until the circle rolled off finds that the number of the enclosed sample points is less than the pre-specified value. We then refer to the first point as the core point, e.g., a, the point to stop as the boundary point, e.g., B, C, and the point that has not rolled as the outlier. By the method, irregular cluster point groups can be formed for various data of the user, and the method is more suitable for general conditions. This case can be used to calculate outlier anomalies.
The scoring criteria were as follows:
Figure BDA0003448859690000111
sample point i (vector) is within the cluster from all other sample points j (average distance of the vectors):
Figure BDA0003448859690000112
the distance between the sample point i and other clusters is defined as the average distance between the i and each point of other clusters, and the minimum value is b (i)
Figure BDA0003448859690000113
It is mentioned above that such clustering algorithm is mainly used for analyzing the instruction operation of the user. Generally speaking, the user's instruction and mouse operation are complex, have many dimensions, and have a large data volume; the users are roughly classified through K-means, and then the complex situation is analyzed through a more detailed DBSCAN clustering algorithm. The data processing mode can comprise various modes, for the analysis of the instruction operation, a word2vec algorithm in semantic analysis can be used, and the instruction is mapped into a multidimensional vector, and meanwhile, the judgment of the risk degree can be added to the instruction; or directly analyze the abundance of use for different instructions. In this case, the DBSCAN can better reflect the degree of abnormality through the density index of the command. Furthermore, the instruction operation and other data can be combined and analyzed, such as the user characteristics of the aforementioned time and IP, or the characteristics of the user accessing the service, various user characteristics are combined to form a multi-dimensional vector, the multi-dimensional vector is subjected to simplified mapping and dimension reduction through a mathematical tool, and then the outlier is found through clustering algorithm analysis. The formed data set has a plurality of different analysis dimensions, namely analysis of a single user in a time scale, analysis among users in a group, analysis among groups, or the condition of all the analysis are combined, and more complicatedly, further semantic analysis can be carried out on instruction operation by continuously researching the use of a neural network, and machine learning is carried out on the semantic analysis. For the combination of different tuples in the complex six-tuple model, in the process of performing primary analysis and secondary analysis by using a K-means algorithm and a DBSCAN clustering algorithm, a word2vec algorithm is introduced to combine with a neural network algorithm to perform combination analysis on users between different tuples.
In this embodiment, an anomaly detection scoring method for instruction operation data in user action information and access service data in user scope information is exemplified, and for an abnormal situation where access service and instruction operation are not very clear in many times, scoring needs to be implemented by combining multiple algorithms for the abnormal situation. For example, a user typically uses the a database and suddenly accesses the B database, most likely because of work needs, so analysis in conjunction with operations is required for this situation. There are some common and dangerous instruction sets for each application being accessed, which are available through daily logs. For an abnormal operation that is used by an operator with little access to the service, this situation would greatly increase the selected abnormal score. In general, the detection of exceptions for access services and instruction operations is based on the detection of instruction operations, which is used as an indicator to increase the weight of exceptions. When a user inputs instructions for operation, there is usually a library of instruction sets, which correspondingly forms a high-dimensional vector space, and the instructions are mapped into high-dimensional vectors, wherein parameters of instruction threat can be added. When the instructions are subjected to algorithm analysis, the word vectors are directly used for analysis processing. Anomaly scores for outliers therein can be obtained by a clustering algorithm. The anomaly scores formed above also decay over time.
For an exception detection based on instruction operation, users may be first grouped by K-means as follows. For the users of the same type, all input instructions can be analyzed and counted according to the use times, time, places, danger degrees and word vectors of the users in different days. And then carrying out specific analysis through a DBSCAN clustering algorithm. The user characteristics are comprehensively analyzed to obtain a more comprehensive user portrait, so that abnormal conditions in the user portrait are found out.
The other parts of this embodiment are the same as those of embodiment 4, and thus are not described again.
Example 8:
in this embodiment, the security scoring module is further optimized based on any one of the embodiments 1 to 8, and in this embodiment, the security scoring module mainly obtains a score obtained by analyzing the degree of abnormality of the user based on the hexahydric group model, for example, whether the time is in normal working time in common use, whether the IP address is in a common place, whether the same IP address logs in multiple users, or whether the same user logs in multiple IP addresses, whether the access is to a common service, whether the input command is highly dangerous, and the like. The specific evaluation standard of the safety score needs to be set by the user, and the system can provide a plurality of sets of standards with different strictures for the user to select.
Meanwhile, for the security score module, the abnormal score obtained by abnormal detection of the user by using a machine learning algorithm and a zero trust model is mainly calculated, a security threshold value is set by the system, if the security score is lower than the threshold value, an administrator is informed by the system, and meanwhile, the security score of the previous day is mapped to the security score of the next day through a security accumulation function.
Other parts of this embodiment are the same as any of embodiments 1 to 8, and thus are not described again.
Example 9:
in this embodiment, as shown in fig. 3, the specific form of the safety mapping function is adjustable, and the following conditions are approximately satisfied:
setting a safety score full score F, a good score G and a passing score H, wherein users higher than the good score G can be mapped to the full score F quickly the next day, users lower than the good score G can be mapped to the G slowly, all the users are mapped to the vicinity of G in one day, and users lower than the good score H can be mapped to the vicinity of (G + H)/2 under the condition of solving the problem by default. The mapping process may be modified by demand for a certain amount of time, i.e., a user below G and above H may need to cross the good score line several days, depending on how many days the previous error in the security setting requirement affected the system alertness.
The rest of this embodiment is the same as embodiment 9, and therefore, the description thereof is omitted.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (9)

1. A method for early warning of a user portrait based on a fortress machine is characterized by comprising the following steps:
s1, acquiring a session log of the bastion machine, acquiring information of a user according to the session log of the bastion machine and classifying the information of the user into a six-element group model;
s2, establishing a zero trust architecture model and a threat library, classifying information of each user in a hexahydric group model according to a machine learning algorithm and the zero trust architecture model, and then carrying out anomaly detection to identify an abnormal user; each user comprises a single user and a group user;
s3, when the abnormal user is judged to appear, the bastion machine carries out early warning on the abnormal user and carries out secondary verification on the abnormal user, and after the secondary verification is error-free, dangerous operation of the abnormal user is stored in a threat library to be used as a basis for judging the abnormal user next time; and the security scoring module is used for mapping the security scoring of the abnormal users.
2. The bastion-based user representation early warning method according to claim 1, wherein the step S1 includes:
acquiring a conversation log of the bastion machine according to a general interface, acquiring information of a user according to the conversation log of the bastion machine and classifying the information of the user into a six-element group model;
the information of the users in the six-element group model is classified into time information, place information, person/ID information, scope information, action information and result information;
and performing combined analysis on the users among different tuples in the six-tuple model.
3. The bastion-based user profile early warning method as claimed in claim 1, wherein the analyzing method for the individual users and the group users in the step S2 comprises:
analyzing the behaviors of the single user and the group users in a time sequence and a place domain from multiple dimensions, and establishing a group baseline and an individual baseline according to the analyzed data of characterization and correlation analysis of the behaviors of the single user and the group users;
and comparing the individual behaviors with the group behaviors according to the average value, the variance and the similarity, and identifying abnormal behaviors.
4. The bastion-based user representation early warning method of claim 1, wherein the step S2 further comprises:
carrying out data classification processing according to a machine learning algorithm;
setting an initial judgment standard in the bastion machine for the user who is used for the first time, wherein the initial judgment standard comprises login abnormal conditions and conditions that user access and operation exceed user rights, classifying the information of the user according to the hexahydric group model, judging the information of the user who is used for the first time according to the initial judgment standard, and considering the user who does not meet the judgment standard as the user with low safety factor;
and for the user using the fortress for at least one month, carrying out anomaly detection on the user according to the session log information of the fortress machine, the six-element group model and a machine learning algorithm, setting various information characteristics of the user in the six-element group model as X, wherein the machine learning algorithm comprises an SVM algorithm, an isolated forest algorithm and a clustering algorithm.
5. The bastion-based user profile early warning method according to claim 4, wherein the method using the SVM algorithm comprises the following steps:
for time information and place information of a user, analyzing the time information and the place information of the user in advance by using an One-Class SVM algorithm, setting a user feature X1, … and xn of a normal behavior as an n-dimensional array of user features, wherein the distance from the X to a center o is smaller than r, and solving a minimum sphere meeting the condition through the given time information and place information of the user;
according to
Figure FDA0003448859680000021
And xi-o||≤r+ξiJudging whether a user is abnormal, wherein V represents the volume of a sphere with the radius r, xi is a relaxation variable introduced by the optimization problem, and C is a penalty parameter; r and the center o are obtained from the user characteristics x1, …, xn of given C and normal behaviors, and it is judged whether or not it is an abnormal point by judging whether or not it is in this sphere, and if it is an abnormal point, it is judged that the time information and the place information for a given user are abnormal.
6. The bastion-based user representation early warning method is characterized in that the method using the isolated forest algorithm comprises the following steps:
a, analyzing the combination of the time information and the place information of the user by using an isolated forest algorithm;
randomly selecting psi point sample points from the characteristic data of the user as a sample subset, and putting the psi point sample points into a root node of the tree;
randomly appointing a dimension, for example, randomly generating a cutting time point p in the current node data when the appointed dimension is the user login time, wherein the cutting point is generated between the earliest time and the latest time in the current node data;
d, generating a limit by using the cutting point, dividing the current node into 2 parts according to the login time, placing the data with the login time earlier than the point p on the left child node of the current node, and placing the data more than or equal to the point p on the right child node of the current node;
e, recursion of the step c and the step d in the child nodes, and continuously constructing new child nodes until only one piece of data in the child nodes can not be cut continuously or the child nodes reach the limited height;
f, circulating the steps b to e until t isolated trees iTrees are generated;
step g, according to the obtained data, traversing each isolated tree for each data point X to obtain the average height h (X) of the data point X in the forest, so that the abnormal score of the obtained data is calculated as shown in the following formula:
Figure FDA0003448859680000022
Figure FDA0003448859680000031
where H (i) is a harmonic number, which can be estimated by ln (i) and the Euler constant, X is the specific coordinate of the data point X, c is the average of the path lengths calculated for a given sample, E (h (X)) is the average of h (X);
and h, judging whether the user is abnormal or not according to the obtained abnormal score.
7. The bastion-based user profile early warning method is characterized in that the method for using the clustering algorithm comprises the following steps:
performing primary analysis on instruction operation data in the user action information and access service data in the user scope information by using a K-means algorithm in a clustering algorithm;
secondly, performing secondary analysis by performing outlier calculation on various information of the users in the six-element group model by using a DBSCAN clustering algorithm in the clustering algorithm;
in the process of using a K-means algorithm and a DBSCAN clustering algorithm, a word2vec algorithm is introduced, and the semantics of user instruction operation is mapped to a multi-dimensional vector space for simplified analysis;
and for the combination of different tuples in the six-tuple model, in the process of performing primary analysis and secondary analysis by using a K-means algorithm and a DBSCAN clustering algorithm, combining a word2vec algorithm with a neural network algorithm to perform combination analysis on users between different tuples.
8. The bastion-based user representation early warning method according to claim 1, wherein the security scoring module in the step S3 comprises:
establishing a security scoring module in the bastion machine according to a zero trust architecture model and a machine learning algorithm;
the safety scoring module carries out safety scoring on the user every day to obtain dynamic behavior information of the user, wherein the dynamic behavior information of the user comprises all behavior information of the user from now to now;
the user is scored safely every day, the safety score is lower than a safety score threshold value to cause an alarm of the bastion machine, and the safety score is determined after manual confirmation;
and accumulating the daily security scores of the users to the next day, and mapping the security scores according to a security mapping function, wherein the users lower than the security score threshold are determined to have potential safety hazards.
9. The bastion-based user representation early warning method is characterized in that the method for mapping the security score according to the security mapping function comprises the following steps of:
setting a safety score full score F, a good score G and a passing score H, wherein users higher than the good score G can map to the full score F quickly the next day, users lower than the good score G can map to the G slowly, all users can map to the vicinity of G one day, and users lower than the good score H can map to the vicinity of (G + H)/2 under the condition of solving the problem by default;
the mapping process may be modified by demand for a certain amount of time, i.e., a user below G and above H may need to cross the good score line several days, depending on how many days the previous error in the security setting requirement affected the system alertness.
CN202111668530.XA 2021-12-31 2021-12-31 Early warning method for user portrait based on fortress machine Pending CN114398966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111668530.XA CN114398966A (en) 2021-12-31 2021-12-31 Early warning method for user portrait based on fortress machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111668530.XA CN114398966A (en) 2021-12-31 2021-12-31 Early warning method for user portrait based on fortress machine

Publications (1)

Publication Number Publication Date
CN114398966A true CN114398966A (en) 2022-04-26

Family

ID=81229195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111668530.XA Pending CN114398966A (en) 2021-12-31 2021-12-31 Early warning method for user portrait based on fortress machine

Country Status (1)

Country Link
CN (1) CN114398966A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011894A (en) * 2023-03-28 2023-04-25 河北长发铝业股份有限公司 Aluminum alloy rod production data management system
CN116996330A (en) * 2023-09-27 2023-11-03 深圳市互盟科技股份有限公司 Data center access control management system based on network security
CN117521042A (en) * 2024-01-05 2024-02-06 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning
CN117591542A (en) * 2024-01-18 2024-02-23 准检河北检测技术服务有限公司 Intelligent detection method for database software data security
CN117521042B (en) * 2024-01-05 2024-05-14 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011894A (en) * 2023-03-28 2023-04-25 河北长发铝业股份有限公司 Aluminum alloy rod production data management system
CN116996330A (en) * 2023-09-27 2023-11-03 深圳市互盟科技股份有限公司 Data center access control management system based on network security
CN116996330B (en) * 2023-09-27 2023-12-01 深圳市互盟科技股份有限公司 Data center access control management system based on network security
CN117521042A (en) * 2024-01-05 2024-02-06 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning
CN117521042B (en) * 2024-01-05 2024-05-14 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning
CN117591542A (en) * 2024-01-18 2024-02-23 准检河北检测技术服务有限公司 Intelligent detection method for database software data security
CN117591542B (en) * 2024-01-18 2024-03-22 准检河北检测技术服务有限公司 Intelligent detection method for database software data security

Similar Documents

Publication Publication Date Title
CN106789885B (en) User abnormal behavior detection and analysis method under big data environment
CN114398966A (en) Early warning method for user portrait based on fortress machine
CN108566364B (en) Intrusion detection method based on neural network
CN111552933A (en) Method and device for identifying abnormal login of account
US20050086529A1 (en) Detection of misuse or abuse of data by authorized access to database
CN111027615B (en) Middleware fault early warning method and system based on machine learning
CN110620696A (en) Grading method and device for enterprise network security situation awareness
CN106534212A (en) Adaptive safety protection method and system based on user behaviors and data states
CN105376193A (en) Intelligent association analysis method and intelligent association analysis device for security events
CN110830467A (en) Network suspicious asset identification method based on fuzzy prediction
CN111723367A (en) Power monitoring system service scene disposal risk evaluation method and system
CN116881749B (en) Pollution site construction monitoring method and system
Ahmad et al. Analysis of classification techniques for intrusion detection
CN112230584A (en) Safety monitoring visualization system and safety monitoring method applied to industrial control field
CN116389159A (en) Electronic information network security system based on multisource data anomaly monitoring
CN114511227A (en) Power monitoring system network security policy arranging and handling method and system
Harutyunyan et al. On Machine Learning Approaches for Automated Log Management.
CN112272176A (en) Network security protection method and system based on big data platform
CN115514581B (en) Data analysis method and equipment for industrial internet data security platform
CN107609330B (en) Access log mining-based internal threat abnormal behavior analysis method
Kim et al. Anomaly pattern detection in streaming data based on the transformation to multiple binary-valued data streams
CN114285596B (en) Transformer substation terminal account abnormity detection method based on machine learning
CN115567241A (en) Multi-site network perception detection system
CN115001781A (en) Terminal network state safety monitoring method
Jyosthna et al. Threat Analysis using N-median Outlier Detection Method with Deviation Score

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination