CN111865941B - Abnormal behavior identification method and device - Google Patents

Abnormal behavior identification method and device Download PDF

Info

Publication number
CN111865941B
CN111865941B CN202010630842.0A CN202010630842A CN111865941B CN 111865941 B CN111865941 B CN 111865941B CN 202010630842 A CN202010630842 A CN 202010630842A CN 111865941 B CN111865941 B CN 111865941B
Authority
CN
China
Prior art keywords
behavior
current user
current
score
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010630842.0A
Other languages
Chinese (zh)
Other versions
CN111865941A (en
Inventor
陈少涵
连鹏程
柳赛普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Skyguard Network Security Technology Co ltd
Chengdu Sky Guard Network Security Technology Co ltd
Original Assignee
Chengdu Sky Guard Network Security Technology Co ltd
Beijing Skyguard Network Security Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sky Guard Network Security Technology Co ltd, Beijing Skyguard Network Security Technology Co ltd filed Critical Chengdu Sky Guard Network Security Technology Co ltd
Priority to CN202010630842.0A priority Critical patent/CN111865941B/en
Publication of CN111865941A publication Critical patent/CN111865941A/en
Application granted granted Critical
Publication of CN111865941B publication Critical patent/CN111865941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Abstract

The invention discloses an abnormal behavior identification method and device, and relates to the technical field of computers. One embodiment of the method comprises: for the identified dimension: determining a behavior deviation vector of a current user according to data of a plurality of behavior characteristics of the current user in a current time period and a historical time period; determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user; determining a first score of the current user under the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users; and determining whether the current behavior of the current user is abnormal or not according to the first score of the current user under the identification dimension. The embodiment can improve the identification accuracy and reduce the false alarm rate.

Description

Abnormal behavior identification method and device
Technical Field
The invention relates to the technical field of network security, in particular to an abnormal behavior identification method and device.
Background
Internal threats refer to malicious threats to enterprise core data assets by internal employees of the enterprise or by external personnel impersonating internal employees of the enterprise. The internal threat may cause the leakage of core data inside the enterprise, and the operation safety of the enterprise is damaged. Since the internal threat can be embodied by host behavior or user behavior, etc., identifying abnormal behavior can determine whether there is an internal threat.
In the prior art, abnormal behaviors are generally identified based on rule matching, that is, collected current behavior data is matched with a preset rule, and if the matching is successful, the behavior is determined to be abnormal.
However, the prior art only performs recognition according to the current behavior data, resulting in low recognition accuracy.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for identifying an abnormal behavior, which can improve identification accuracy.
In a first aspect, an embodiment of the present invention provides an abnormal behavior identification method, including:
for the identified dimension: determining a behavior deviation vector of a current user according to data of a plurality of behavior characteristics of the current user in a current time period and a historical time period;
determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user;
determining a first score of the current user under the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users;
and determining whether the current behavior of the current user is abnormal or not according to the first score of the current user under the identification dimension.
Alternatively,
the determining a behavior deviation vector of the current user according to data of a plurality of behavior characteristics of the current user in the current time period and the historical time period comprises the following steps:
for each behavioral characteristic of the current user: calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the behavior characteristics in the current time period and the historical time period; the behavior deviation of the current user corresponding to each behavior feature is an element of the behavior deviation vector.
Alternatively,
the number of the historical time periods is greater than 1;
the calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the behavior characteristics in the current time period and the historical time period comprises the following steps:
normalizing the data of the behavior characteristics in each historical time period;
normalizing the data of the behavior characteristics in the current time period;
and calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the normalized data of the behavior characteristics in each historical time period and current time period.
Alternatively,
further comprising:
determining a second score of the current user under the identification dimension according to the behavior deviation vector of the current user; wherein the second score is used for characterizing the deviation degree of the current behavior of the current user relative to the historical behavior of the current user;
determining whether the current behavior of the current user is abnormal according to the first score of the current user in the identification dimension includes:
and determining whether the current behavior of the current user is abnormal or not according to the first score and the second score of the current user in the identification dimension.
Alternatively,
further comprising:
determining a third score of the current user under the identification dimension according to data of the other users and a plurality of behavior characteristics of the current user in the current time period; wherein the third score is used to characterize a degree of deviation of the current behavior of the current user relative to the current behaviors of the number of other users;
determining whether the current behavior of the current user is abnormal according to the first score and the second score of the current user in the identification dimension, wherein the determining comprises:
and determining whether the current behavior of the current user is abnormal or not according to the first score, the second score and the third score of the current user in the identification dimension.
Alternatively,
determining a third score of the current user in the identification dimension according to the data of the other users and the data of the plurality of behavior characteristics of the current user in the current time period, including:
for each behavioral characteristic in the identified dimension: calculating the statistical value of the data of the behavior characteristics of a plurality of other users in the current time period;
and determining a third score of the current user in the identification dimension according to the statistical value corresponding to each behavior feature and the data of a plurality of behavior features of the current user in the current time period.
Alternatively,
determining a third score of the current user in the identification dimension according to the statistical value corresponding to each behavior feature and data of a plurality of behavior features of the current user in the current time period, including:
calculating the similarity between the current behavior of the current user and the current behaviors of the other users according to the statistical value corresponding to each behavior feature and the data of the behavior features of the current user in the current time period;
and determining a third score of the current user under the identification dimension according to the similarity between the current behavior of the current user and the current behaviors of the other users.
Alternatively,
further comprising:
determining a third score of the current user in the identification dimension according to data of the other users and a plurality of behavior characteristics of the current user in the identification dimension in the current time period; wherein the third score is used to characterize a degree of deviation of the current behavior of the current user relative to the current behaviors of the number of other users;
determining whether the current behavior of the current user is abnormal according to the first score of the current user in the identification dimension includes:
and determining whether the current behavior of the current user is abnormal or not according to the first score and the third score of the current user in the identification dimension.
Alternatively,
determining a first score of the current user in the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users, including:
determining the slicing values of the current user in each characteristic dimension according to the behavior deviation direction of the current user; the segment value of the current user in the characteristic dimension is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users in the characteristic dimension;
determining the slicing values of the other users in each feature dimension according to the behavior deviation directions of the other users;
determining a sharding combination corresponding to the current user according to sharding values of the current user and the other users in each feature dimension;
and determining a first score of the current user under the identification dimensionality according to the number of users corresponding to the fragment combination.
Alternatively,
the number of the identification dimensions is greater than 1;
the determining whether the current behavior of the current user is abnormal according to the first score, the second score and the third score of the current user in the identification dimension includes:
for each of the identified dimensions: determining a risk score of the current user in the identification dimension according to the first score, the second score and the third score of the current user in the identification dimension;
and determining whether the current behavior of the current user is abnormal or not according to the risk score corresponding to each identification dimension.
Alternatively,
determining whether the current behavior of the current user is abnormal according to the risk score corresponding to each identification dimension includes:
determining a dimension group to which each identification dimension belongs;
for each of the dimension groups: determining the risk score of the dimension group according to the risk score corresponding to each identification dimension in the dimension group;
and determining whether the current behavior of the current user is abnormal or not according to the risk score of each dimension group.
Alternatively,
data of the plurality of behavioral characteristics, including: any one or more of the amount of data sent out, the number of connection requests sent out, the number of connection targets and the communication time length.
Alternatively,
the identifying dimension includes: TCP (Transmission Control Protocol), UDP (User Datagram Protocol), ICMP (Internet Control Message Protocol), SSL (Secure socket Layer), SMTP (Simple Mail Transfer Protocol), HTTP (hypertext Transfer Protocol).
In a second aspect, an embodiment of the present invention provides an abnormal behavior identification apparatus, including:
a deviation determination module configured to, for an identified dimension: determining a behavior deviation vector of a current user according to data of a plurality of behavior characteristics of the current user in a current time period and a historical time period;
the direction determining module is configured to determine a behavior deviation direction of the current user according to the behavior deviation vector of the current user;
the score determining module is configured to determine a first score of the current user in the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users;
and the abnormity determining module is configured to determine whether the current behavior of the current user is abnormal according to the first score of the current user in the identification dimension.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device to store one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
In a fourth aspect, an embodiment of the present invention provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method of any one of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: based on the first score, whether the current behavior is abnormal or not can be determined by the difference degree of the behavior change trends of different users; based on the second score, whether the current behavior is abnormal or not can be determined from the perspective of the change trend of the current user behavior; based on the third score, whether the current behavior is abnormal or not can be determined according to the degree of difference of the behaviors of the current user and other users in the current time period. Thus, in combination with the first score and the other scores, it is possible to identify whether the current behavior is at risk from different perspectives. Compared with the prior art, the method has higher identification accuracy and can reduce the false alarm rate.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a flow chart of a method for identifying abnormal behavior according to an embodiment of the present invention;
fig. 2 is a flowchart of an abnormal behavior recognition method according to another embodiment of the present invention;
FIG. 3 is a flow chart of a method for identifying abnormal behavior according to another embodiment of the present invention;
fig. 4 is a schematic diagram of an abnormal behavior recognition apparatus according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The sources of internal threats may be several:
(1) The internal malicious users, which threaten the interior of the enterprise by utilizing the authority of the users;
(2) Unconscious users, who bring risks to enterprises due to unconscious misbehavior;
(3) External intruders who steal enterprise data by intruding into an authorized user's computer or obtaining an authorized user's key.
Since the host behavior or the user behavior may reflect whether there is an internal threat, it may be determined whether there is an internal threat through abnormal behavior identification. The prior art generally identifies abnormal behaviors based on rule matching, and because the current behavior data is only used as a judgment basis, the identification angle is single, and therefore the identification accuracy and reliability are low.
In addition, the prior art can only detect common network attacks, for example, DDos (Distributed Denial of Service) attacks, and has a high difficulty in identifying internal threats using unknown vulnerabilities and standard protocols. In addition, in a large-scale network system, it is costly to perform behavior recognition based on the log of the host.
In view of the foregoing, an embodiment of the present invention provides an abnormal behavior identification method, as shown in fig. 1, including:
step 101: for the identified dimension: and determining a behavior deviation vector of the current user according to the data of a plurality of behavior characteristics of the current user in the current time period and the historical time period.
There may be one or more identification dimensions. Identifying dimensions, including: any one or more of TCP, UDP, ICMP, SSL, SMTP and HTTP. The identification dimension may also be a GET method of HTTP, a POST method of HTTP, and the like. For convenience of description, the embodiment of the present invention is illustrated by taking only one identification dimension as an example.
There may be a plurality of historical periods, for example, the current period is today, the historical periods are 2, each period is 1 day long, i.e. the historical period is yesterday and the previous day.
The behavior deviation vector is composed of a behavior deviation, which may be an absolute behavior deviation or a relative behavior deviation, and the specific content will be described in the following embodiments.
Step 102: and determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user.
The direction indicated by the behavior deviation vector is a behavior deviation direction. The behavior deviation direction of the current user can reflect the deviation trend of the current behavior of the current user relative to the historical behavior of the current user.
Step 103: determining a first score of the current user under the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users.
Step 104: and determining whether the current behavior of the current user is abnormal or not according to the first score of the current user under the identification dimension.
In the embodiment of the present invention, the first score may be compared with a preset risk threshold, and if the first score is greater than the risk threshold, it is determined that the current behavior of the current user is abnormal.
Compared with the prior art, the embodiment of the invention combines the data of the behavior characteristics of the current time period and the historical time period, can reduce the influence of accidental behaviors on the recognition result, more accurately reflects the condition of the current user and obtains more accurate and reliable recognition result. In addition, the first score is calculated based on the behavior deviation direction, and the first score can represent the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users, so that the difference between the behavior change trend of the current user and the behavior change trend of other users can be considered, and the recognition result has higher accuracy and reliability.
In one embodiment of the present invention, the method may further comprise: acquiring host behavior data; and determining user behavior data according to the host behavior data and the binding relationship between the host and the user. By the embodiment of the invention, the user behavior can be obtained from the host behavior, and the behavior is further identified on the user level. Considering that different users may use the same host, the abnormal behavior of one user does not represent the abnormal behavior of other users using the host, and therefore, in order to more accurately identify the abnormal behavior, the embodiment of the present invention uses the user behavior data instead of the host behavior data. Wherein the user behavior data comprises: data of several behavioral characteristics of the user.
In one embodiment of the invention, the current behavior feature data comprises: a plurality of behavior feature data of a current time period;
historical behavioral characteristic data, including: a plurality of behavior characteristic data of each historical time period in a plurality of historical time periods;
determining a behavior deviation vector of the current user according to data of a plurality of behavior characteristics of the current user in the current time period and the historical time period, wherein the determining comprises the following steps:
for each behavior feature of the current user: calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the behavior characteristics in the current time period and the historical time period; the behavior deviation of the current user corresponding to each behavior feature is an element of the behavior deviation vector.
There may be one or more behavior characteristics, and for each behavior characteristic, a corresponding behavior deviation may be calculated. Data for a number of behavioral characteristics, including: any one or more of the amount of data sent out, the number of connection requests sent out, the number of connection targets and the communication time length.
In the embodiment of the present invention, the behavior deviation of the current user corresponding to the behavior feature may be calculated according to the statistical value (for example, a median, an average, or the like) of the data of the behavior feature of each historical time period and the data of the behavior feature of the current time period.
The behavior deviation may be an absolute behavior deviation, the absolute behavior deviation = data of the behavior feature of the current time period — a statistical value (e.g., a median or an average, etc.) of data of the behavior feature of each historical time period.
The behavior deviation may also be a relative behavior deviation, a relative behavior deviation = (data of the behavior characteristic at the current time period — statistical value of data of the behavior characteristic at each historical time period)/statistical value of data of the behavior characteristic at each historical time period, or a relative behavior deviation = (data of the behavior characteristic at the current time period — statistical value of data of the behavior characteristic at each historical time period)/statistical value of data of the behavior characteristic at each historical time period.
In one embodiment of the invention, the number of historical time periods is greater than 1;
calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the behavior characteristics in the current time period and the historical time period, wherein the calculation comprises the following steps:
normalizing the data of the behavior characteristics in each historical time period;
normalizing the data of the behavior characteristics in the current time period;
and calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the normalized behavior characteristics in each historical time period and the current time period.
In a practical application scenario, the data of the user behavior characteristics may not follow a normal distribution, but rather a biased distribution, such as a power law distribution. That is, there may be abnormal data in the data of the user behavior characteristics, and the abnormal data may cause abnormality in the calculated median or average.
Therefore, in order to reduce the influence of data distribution unevenness, the embodiment of the present invention performs normalization processing on the data of the behavior characteristics in the historical time period and the current time period.
In an embodiment of the present invention, considering that direct linear normalization results in loss of comparability of data, and direct curve normalization greatly destroys the original relationship of data, the embodiment of the present invention reduces the interference of the maximum value on the normalization result by taking logarithm.
In an actual application scenario, the data of the behavior characteristics in the historical time period and the current time period of each user may be normalized uniformly according to the type of the behavior characteristics, for example, the amount of data sent to the outside by each user in each time period is normalized, and the number of connection targets of each user in each time period is normalized. It should be noted that, the respective time periods mentioned in the embodiments of the present invention refer to the respective historical time periods and the current time period.
Specifically, the embodiment of the present invention is normalized by the formula (1), the formula (2), and the formula (3).
Figure BDA0002568599050000101
Figure BDA0002568599050000102
Figure BDA0002568599050000103
Wherein x is i Data for characterizing the i-th behavior feature, X for characterizing the data set of the behavior feature of each user in each time period, D for characterizing the intermediate result set, Q k (D) Is the k quantile of D, in particular Q 75 (D) All numerical values of D are 75% of the numbers after being arranged from small to large; q 50 (D) D is the 50% number of all numerical values arranged from small to large; q 25 (D) D is the 25 th% of the numbers after all the values are arranged from small to large; LQWM (D) is the adjusted bilateral quartile weighted median, x i ' data characterizing the normalized i-th behavior feature.
In an embodiment of the present invention, in order to reduce an influence of abnormal data on an identification result, in consideration of possible abnormal data in the data of the behavior feature, before normalizing the data of the behavior feature for each historical time period, the method may further include: generating a box chart according to the data of the behavior characteristics of each historical time period; determining a plurality of data of non-abnormal behavior characteristics in the data of the behavior characteristics of each historical time period according to the box diagram; and normalizing the data of each non-abnormal behavior characteristic.
According to the embodiment of the invention, the data of the abnormal behavior characteristics are screened out through the box chart, and only the data of the non-abnormal behavior characteristics are normalized, so that the accuracy and the reliability of the identification result can be improved.
In one embodiment of the invention, the method further comprises:
determining a second score of the current user under the identification dimension according to the behavior deviation vector of the current user; the second score is used for representing the deviation degree of the current behavior of the current user relative to the historical behavior of the current user;
determining whether the current behavior of the current user is abnormal or not according to the first score of the current user under the identification dimension, wherein the determining comprises the following steps:
and determining whether the current behavior of the current user is abnormal or not according to the first score and the second score of the current user in the identification dimension.
In the embodiment of the present invention, the sum of the first score and the second score may be compared with a preset risk threshold, and if the sum is greater than the risk threshold, it is determined that the current behavior is abnormal. Of course, the weights of the first score and the second score may also be determined respectively, and whether the current behavior of the current user is abnormal or not may be determined according to the first score and the weight thereof, and the second score and the weight thereof.
Because the second score can represent the deviation degree of the current behavior of the current user relative to the historical behavior of the current user, the embodiment of the invention can consider the change of the current behavior of the user relative to the historical behavior of the user, and further improve the accuracy and the reliability of abnormal behavior identification. In fig. 2 of the present disclosure, an example of this embodiment will be described in detail.
In one embodiment of the invention, the method further comprises:
determining a third score of the current user in the identification dimension according to data of other users and a plurality of behavior characteristics of the current user in the current time period; the third score is used for representing the deviation degree of the current behavior of the current user relative to the current behaviors of a plurality of other users;
determining whether the current behavior of the current user is abnormal according to the first score and the second score of the current user in the identification dimension, wherein the determining comprises the following steps:
and determining whether the current behavior of the current user is abnormal or not according to the first score, the second score and the third score of the current user in the identification dimension.
In consideration of similarity of data of behavior characteristics of different users in the same time period, the embodiment of the present invention calculates a third score based on the first score and the second score, and characterizes a difference between the current behavior of the current user and the current behavior of other users through the third score.
According to the embodiment of the invention, whether the behavior is abnormal or not is identified from three different angles through the three scores, and the identification result obtained through the embodiment of the invention has higher accuracy. In fig. 3 of the present disclosure, an example of this embodiment will be described in detail.
In one embodiment of the invention, the method further comprises: determining a third score of the current user in the identification dimension according to data of other users in the identification dimension and a plurality of behavior characteristics of the current user in the current time period; the third score is used for representing the deviation degree of the current behavior of the current user relative to the current behaviors of a plurality of other users;
determining whether the current behavior of the current user is abnormal according to the first score of the current user under the identification dimension, wherein the determining comprises the following steps:
and determining whether the current behavior of the current user is abnormal or not according to the first score and the third score of the current user in the identification dimension.
The embodiment of the invention can simultaneously consider the difference between the current behavior of the current user and the current behaviors of other users and the difference between the behavior change trend of the current user and the behavior change trends of other users, and measure whether the current behavior of the current user is abnormal or not from different angles. In fig. 3 of the present disclosure, an example of this embodiment will be described in detail.
In an embodiment of the present invention, determining a third score of the current user in the identification dimension according to data of a plurality of behavior characteristics of other users and the current user in the current time period includes:
for each behavior feature in the identification dimension: calculating a statistical value of data of the behavior characteristics of a plurality of other users in the current time period;
and determining each behavior characteristic of the third score of the current user under the identification dimension according to the statistical value corresponding to each behavior characteristic and the data of a plurality of behavior characteristics of the current user in the current time period.
Similar to the above embodiment, before calculating the statistical value of the data of the behavior feature of several other users in the current time period, the data of the behavior feature may be normalized and filtered through a box chart.
In an embodiment of the present invention, a statistical value (for example, a median or an average value, etc.) of data of behavior characteristics of a plurality of other users in a current time period may also be calculated, and a third score of the current user in the identification dimension may be determined according to the statistical value corresponding to each behavior characteristic data and the data of the plurality of behavior characteristics of the current user in the current time period.
In an embodiment of the present invention, determining a third score of the current user in the identification dimension according to the statistical value corresponding to each behavior feature and data of a plurality of behavior features of the current user in the current time period includes:
calculating the similarity between the current behavior of the current user and the current behaviors of a plurality of other users according to the statistical value corresponding to each behavior characteristic and the data of a plurality of behavior characteristics of the current user in the current time period;
and determining a third score of the current user under the identification dimension according to the similarity of the current behavior of the current user and the current behaviors of a plurality of other users.
The embodiment of the invention represents the similarity between the current behavior of the current user and the current behaviors of a plurality of other users through the Euclidean distance, as shown in the formula (4). In an actual application scenario, hamming distance can also be used to represent the similarity between the current behavior of the current user and the current behaviors of several other users.
Figure BDA0002568599050000131
Wherein x is j Data characterizing the jth behavioral characteristic of the current user during the current time period, c j The method is used for representing the corresponding average value (or other statistical values such as median) of the jth behavior feature, d is used for representing the category of the behavior feature, and S is used for representing the similarity between the current behavior of the current user and the current behaviors of a plurality of other users.
In the embodiment of the present invention, the third score may be equal to the similarity S, and may also be obtained by mathematically transforming the similarity. The embodiment of the invention can determine the similarity degree of the current behavior of the current user and the current behaviors of other users by calculating the similarity degree of the current behavior of the current user and the current behaviors of a plurality of other users, if the similarity degree is large, the difference between the current behavior of the current user and the current behaviors of other users is small, the possibility of abnormality is small, otherwise, the possibility of abnormality of the current behavior of the current user is high.
In the embodiment of the present invention, the behavior deviation vector may indicate a behavior deviation direction, and in order to facilitate subsequent determination of a segment combination corresponding to a current user, the behavior deviation vector may also be converted into a unit vector.
In the embodiment of the present invention, the behavior deviation direction, which is represented by the unit vector M, can be calculated by equation (5).
Figure BDA0002568599050000132
Where M is used to characterize the behavior deviation vector, M = (M) 1 ,M 2 ……M d ) M 'is used to characterize the unit vector, M' = (M) 1 ′,M 2 ′……M d '), | M | used to characterize the pattern length of the behavior deviation vector, M | j And j =1 and 2 … … d for characterizing the behavior deviation of the current user corresponding to the j-th behavior feature.
In the embodiment of the present invention, the behavior deviation direction is represented by cartesian coordinates, and in other scenarios, the behavior deviation direction may also be represented by spherical coordinates or hyper-spherical coordinates.
In one embodiment of the present invention, determining a first score of a current user in an identification dimension according to a behavior deviation direction of the current user and behavior deviation directions of other users includes:
determining the slicing values of the current user in each characteristic dimension according to the behavior deviation direction of the current user; the segmentation value of the current user in the characteristic dimension is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users in the characteristic dimension;
determining the slicing values of the other users in each characteristic dimension according to the behavior deviation directions of the other users;
determining a fragment combination corresponding to the current user according to the fragment values of the current user and other users in each feature dimension;
and determining a first score of the current user under the identification dimension according to the number of users corresponding to the fragment combination.
If the behavior deviation direction is represented by cartesian coordinates, the cartesian coordinates are converted into hyper-spherical coordinates by equation (6).
Figure BDA0002568599050000141
Where ρ, φ 1 、φ 2 ……φ d-1 For different feature dimensions of the behavior deviation direction, one feature dimension corresponds to one behavior feature. Note that, for the behavior deviation direction of the other user, the coordinate conversion is performed in the same manner.
Since p =1 exists in different users, the feature dimension may not participate in the slicing process, i.e., only determine Φ 1 、φ 2 ……φ d-1 A fractional value for each feature dimension.
And determining the width of the fragment according to the hypersphere coordinate of each other user in the same dimension.
Specifically, the slice width is calculated by adopting formula (7);
h k =3.5σ k n -1/3 (7)
wherein h is k Slice width, σ, for characterizing the kth feature dimension k The standard deviation of the hyper-spherical coordinates used for characterizing the k-th characteristic dimension of a plurality of other users, and n is used for characterizing the number of other users. It is understood that "3.5" in equation (7) is merely an example parameter value, and the disclosure is not limited thereto.
Determining the slicing value of the current user in each characteristic dimension according to the hypersphere coordinates and the slicing width of each dimension of the current user; and the fragment values of all the characteristic dimensions of the current user form a fragment combination corresponding to the current user.
In the embodiment of the invention, the hypersphere coordinates of each dimension of the current user are divided by the width of the fragment to obtain the fragment value of the current user in each characteristic dimension.
The embodiment of the present invention calculates the first score by equation (8).
Figure BDA0002568599050000151
Wherein, F 1 The first score is represented, p is used for representing the number of users corresponding to the target fragment combination, and t is used for representing the total number of the current user and other users; and the target fragment combination is the fragment combination to which the current user belongs.
As can be seen from equation (8), the larger the number of users corresponding to the target segment combination is, the closer the behavior change trend of the current user is to the behavior change trend of other users, and the lower the first score is, the smaller the possibility of abnormality is.
If the behavior characteristics are three, the behavior deviation direction can be represented by spherical coordinates, and the subsequent processing procedure is similar to that of hyper-spherical coordinates, and is not described herein again.
The first score is determined by the slicing value of each characteristic dimension, so that the first score can reflect the influence of the change of different dimensions on the first score, and further the abnormity of each dimension is considered.
In an embodiment of the present invention, determining whether there is an abnormality in the current behavior of the current user according to the first score, the second score and the third score of the current user in the identification dimension includes:
for each identified dimension: determining a risk score of the current user in the identification dimension according to the first score, the second score and the third score of the current user in the identification dimension;
and determining whether the current behavior of the current user is abnormal or not according to the risk score corresponding to each identification dimension.
The risk score may be a sum of the first score, the second score, and the third score.
To simplify the calculation, before calculating the risk score, the second score and the third score may be normalized such that the values of the second score and the third score are in the interval of 0 to 1.
Specifically, the normalization processing may be performed using equation (9).
Figure BDA0002568599050000161
Wherein g is used for characterizing the second score or the third score, and g' is used for characterizing the normalized second score or the normalized third score.
Considering that the direct participation of the first score in the summation may lead to narrowing of the gap between the different risk scores, and considering the association relationship between the first score and the second score and the third score, the first score may be used as a parameter for adjusting the second score and the third score.
Specifically, the risk score is calculated by equation (10).
F=(F 2 +F 3 )×F 1 (10)
Wherein F is used to characterize the risk score, F 2 For characterizing the second fraction, F 3 For characterizing the third score.
In a practical application scenario, the risk score may also be adjusted by a factor such that the value of the risk score is in a suitable value domain. For example, the adjusted risk score is obtained by multiplying the risk score obtained in equation (10) by a coefficient.
In the embodiment of the present invention, the sum of the risk scores corresponding to each identification dimension may be compared with a preset risk threshold, and if the sum is greater than the risk threshold, it is determined that the current behavior of the current user is abnormal.
In an embodiment of the present invention, determining whether there is an abnormality in the current behavior of the current user according to the risk score corresponding to each identification dimension includes:
determining a dimension group to which each identification dimension belongs;
for each dimension group: determining the risk score of the dimension group according to the risk score corresponding to each identification dimension in the dimension group;
and determining whether the current behavior of the current user is abnormal or not according to the risk score of each dimension group.
The dimension groups may be determined according to characteristics of data corresponding to the dimensions, for example, the dimension groups may include network dimension groups, protocol dimension groups, and the like. The set of network dimensions includes: any one or more of TCP, UDP, ICMP, SMTP and SSL, the protocol dimension group comprises: cookie domain and HTTP.
In the embodiment of the invention, the risk grade of the current user can be determined according to the risk score of each dimension group and a preset grading strategy, and whether the current behavior is abnormal or not is determined according to the risk grade of the current user.
The grading strategy comprises the following steps:
if the risk scores of all the dimension groups are larger than the first threshold value, or the dimension groups with the risk scores larger than the second threshold value exist, the current user is in a fifth risk level;
if the risk scores of the two dimension groups are both larger than the first threshold value, or the dimension groups with the risk scores larger than the third threshold value exist, the current user is in a fourth risk level;
if the dimension group with the risk score larger than the fourth threshold exists, the current user is in a third level;
if the dimension group with the risk score larger than the fifth threshold exists, the current user is in a second level;
if the current user is not at the fifth level, the fourth level, the third level, and the second level, the current user is at the first level.
In the embodiment of the invention, whether the current user belongs to the fifth grade, the fourth grade, the third grade, the second grade and the first grade is sequentially judged.
According to the embodiment of the invention, through dividing the dimension groups, the behavior data of dimensions such as networks, protocols and the like can be more finely distinguished, so that the influence of different dimension groups on the identification result can be reflected, and the accuracy and reliability of the abnormity identification can be improved.
In the embodiment of the present invention, the higher the rank is, the higher the possibility of the presence of an abnormality is, wherein the abnormality possibility of the fifth rank is the greatest. In an actual application scenario, it may be set whether different risk levels trigger an abnormal alarm, for example, if the fifth level and the fourth level are abnormal levels, if it is determined that the current user is at the fifth level or the fourth level, it is determined that an abnormality exists in the current behavior, and then a responsive abnormal alarm is triggered.
As shown in fig. 2, the anomaly identification method is described by taking data with identification dimension TCP and behavior characteristics as outgoing transmission data volume as an example, it is understood that the identification dimension (TCP) and the behavior characteristics data (outgoing transmission data volume) in this example are only examples, and in practice, this embodiment is also applicable to other identification dimensions (for example, any one or more of TCP, UDP, ICMP, SSL, SMTP, and HTTP) and various behavior characteristics data (for example, any one or more of outgoing transmission data volume, number of outgoing transmission connection requests, number of connection targets, and communication duration) in this identification dimension. The method comprises the following steps:
step 201: an outbound data sending behavior for the current user: the amount of data sent out at each historical time period is normalized.
The invention judges whether the user has abnormity based on the data volume sent out by the user through the TCP.
Step 202: and normalizing the data volume sent out in the current time period.
Step 203: calculating the behavior deviation of the current user corresponding to the outward sending data according to the normalized outward sending data quantity of the current time period and the outward sending data quantity of each historical time period; and sending the behavior deviation of the current user corresponding to the data outwards as one element of the behavior deviation vector.
Step 204: and determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user.
Step 205: and determining a first score of the current user according to the behavior deviation direction of the current user and the behavior deviation directions of other users.
The first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users.
Step 206: and determining a second score of the current user according to the behavior deviation vector of the current user.
Wherein the second score is used to characterize how far the current behavior of the current user deviates from its historical behavior.
Step 207: and determining whether the current behavior of the current user is abnormal or not according to the first score and the second score of the current user.
The embodiment of the invention can simultaneously consider the difference between the behavior change trend of the current user and the behavior change trends of other users and the deviation degree of the current behavior of the current user relative to the historical behavior of the current user, measure whether the current behavior of the current user is abnormal or not from different angles, and improve the accuracy and the reliability of abnormal behavior identification.
As shown in fig. 3, the embodiment of the present invention provides an anomaly identification method, and it is understood that the identification dimensions (UDP and HTTP) and the behavior characteristic data (the number of outgoing connection requests and the number of connection targets) in this example are merely examples, and in practice, this embodiment is also applicable to other identification dimensions (for example, any one or more of TCP, UDP, ICMP, SSL, SMTP and HTTP) and various behavior characteristic data (for example, any one or more of the number of outgoing data, the number of outgoing connection requests, the number of connection targets and the communication duration) in this identification dimension. The method comprises the following steps:
step 301: for the identified dimension: for each behavior feature of the current user: and normalizing the data of the behavior characteristics in each historical time period.
The identification dimensions are UDP and HTTP, and are calculated for each identification dimension. The behavior characteristics comprise: and sending the number of the connection requests and the number of the connection targets outwards. The embodiment of the invention determines whether the behavior of the user is abnormal or not by the quantity of the connection requests and the quantity of the connection targets sent out by the user under the UDP and the quantity of the connection requests and the quantity of the connection targets sent out by the user under the HTTP.
Step 302: and normalizing the data of the behavior characteristics in the current time period.
Step 303: calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the normalized behavior characteristics in each historical time period and current time period; the behavior deviation of the current user corresponding to each behavior feature is an element of the behavior deviation vector.
Step 304: and determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user.
Step 305: and determining a first score of the current user in the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users.
The first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users.
Step 306: and determining a third score of the current user in the identification dimension according to the data of other users in the identification dimension and a plurality of behavior characteristics of the current user in the current time period.
Wherein the third score is used to characterize a degree of deviation of the current behavior of the current user relative to the current behaviors of the number of other users.
Step 307: and determining whether the current behavior of the current user is abnormal or not according to the first score and the third score of the current user in the identification dimension.
The embodiment of the invention can simultaneously consider the difference between the current behavior of the current user and the current behaviors of other users and the difference between the behavior change trend of the current user and the behavior change trends of other users, thereby improving the accuracy of abnormal recognition and reducing the false alarm rate.
The embodiment of the present invention describes an abnormal behavior recognition method in detail by taking data of behavior characteristics as an example of an outward sending data amount, an outward sending connection request amount, and a connection target amount, where the method includes:
s1: and acquiring current behavior data and historical behavior data of the host under the identification dimension.
The identification dimension includes TCP, UDP, ICMP, SSL, SMTP, HTTP GET method and HTTP POST method.
S2: determining current behavior data and historical behavior data of the user under the identification dimension according to the binding relationship between the IP address of the host and the user; current behavioral data of the user, including: data of 3 behavior characteristics of the current time period; historical behavioral data, including: data of 3 behavior characteristics for several historical time periods.
The users mentioned in the embodiment of the invention comprise the current user and other users.
The current time period is the current day, the historical time period is each day in a week before the current day, namely 7 historical time periods are included, and the duration of the historical time period is 1 day.
S3: and normalizing the data of each behavior characteristic.
The following behavior feature data refers to behavior feature data after normalization processing, and is collectively referred to as behavior feature data for convenience of description. And (3) normalizing the data of each behavior characteristic by adopting an expression (1), an expression (2) and an expression (3).
S4: for each behavior feature in the identification dimension: and calculating the average value of the data of the behavior characteristics of a plurality of other users in the current time period.
S5: and calculating the similarity between the current behavior of the current user and the current behaviors of a plurality of other users according to the average value corresponding to each behavior characteristic and the data of the plurality of behavior characteristics of the current user in the current time period.
S6: and determining a third score of the current user under the identification dimension according to the similarity of the current behavior of the current user and the current behaviors of a plurality of other users.
And respectively calculating the data of three behavior characteristics, namely the outward sending data quantity, the outward sending connection request quantity and the connection target quantity.
Taking data with an identification dimension of UDP and behavior characteristics as an example of sending data volume outwards, the embodiment of the present invention focuses on sending data volume outwards through UDP.
F 3 =S。
S7: for each behavior feature in the identification dimension: and calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the behavior characteristics of each historical time period and the data of the behavior characteristics of the current time period.
S8: and determining a second score of the current user under the identification dimension according to the behavior deviation of the current user corresponding to the behavior characteristics.
F 2 =‖M‖。
S9: determining the behavior deviation direction of the current user according to the behavior deviation of the current user corresponding to the behavior characteristics; the behavior deviation of the current user corresponding to each behavior feature is an element of the behavior deviation vector.
The behavior deviation vector is a unit vector in a cartesian coordinate system, where d =3.
S10: determining the fragment value of the current user in each characteristic dimension according to the behavior deviation direction of the current user; the behavior deviation direction of the current user is represented by a plurality of hyper-spherical coordinates, and the hyper-spherical coordinates correspond to the dimensionality.
When d =3, formula (6) is:
Figure BDA0002568599050000211
s11: and determining the fragment combination corresponding to the current user according to the fragment values of the current user in each feature dimension.
S12: and determining a first score of the current user under the identification dimension according to the number of the users corresponding to the fragment combination.
F 1 =1-p/t。
S13: and determining the risk score of the current user in the identification dimension according to the first score, the second score and the third score of the current user in the identification dimension.
S14: and determining the dimension group to which each identification dimension belongs.
The set of network dimensions includes: TCP, UDP, ICMP, SSL, and SMTP. The set of protocol dimensions includes: the HTTP GET method and the HTTP POST method.
S15: for each dimension group: and determining the risk score of the dimension group according to the risk score corresponding to each identification dimension in the dimension group.
In the embodiment of the present invention, an average value of the risk scores of the dimensions in the dimension group may be used as the risk score of the dimension group, and a maximum risk score in the dimension group may also be used as the risk score of the dimension group.
S16: and determining the risk level of the current user according to the risk score of each dimension group and a preset grading strategy.
S17: and determining whether the current behavior of the current user is abnormal or not according to the risk level of the current user.
The embodiment of the invention comprehensively considers the difference between the current behavior of the current user and the historical behavior thereof, the difference between the current behavior of the current user and the current behavior of other users, and the difference between the behavior change trend of the current user and the behavior change trend of other users.
As shown in fig. 4, an embodiment of the present invention provides an abnormal behavior recognition apparatus, including:
a deviation determination module 401 configured to, for an identified dimension: determining a behavior deviation vector of the current user according to data of a plurality of behavior characteristics of the current user in the current time period and the historical time period;
a direction determining module 402 configured to determine a behavior deviation direction of the current user according to the behavior deviation vector of the current user;
a score determining module 403, configured to determine a first score of the current user in the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of the other users; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users;
an anomaly determination module 404 configured to determine whether there is an anomaly in the current behavior of the current user according to the first score of the current user in the identified dimension.
In one embodiment of the invention, the deviation determining module 401 is configured to, for each behavior feature of the current user: calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the behavior characteristics in the current time period and the historical time period; the behavior deviation of the current user corresponding to each behavior feature is an element of the behavior deviation vector.
In one embodiment of the invention, the number of historical time periods is greater than 1; a deviation determining module 401 configured to normalize the data of the behavior characteristics in each historical time period; normalizing the data of the behavior characteristics in the current time period; and calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the normalized behavior characteristics in each historical time period and current time period.
In an embodiment of the present invention, the score determining module 403 is configured to determine a second score of the current user in the identification dimension according to the behavior deviation vector of the current user; the second score is used for representing the deviation degree of the current behavior of the current user relative to the historical behavior of the current user;
and the anomaly determination module 404 is configured to determine whether the current behavior of the current user is anomalous according to the first score and the second score of the current user in the identification dimension.
In an embodiment of the present invention, the score determining module 403 is configured to determine a third score of the current user in the identification dimension according to data of the current time period of a plurality of behavior characteristics of other users and the current user; the third score is used for representing the deviation degree of the current behavior of the current user relative to the current behaviors of a plurality of other users;
and the anomaly determination module 404 is configured to determine whether the current behavior of the current user is anomalous according to the first score, the second score and the third score of the current user in the identification dimension.
In one embodiment of the invention, the score determination module 403 is configured to, for each behavior feature in the identified dimension: calculating the statistical value of the data of the behavior characteristics of a plurality of other users in the current time period; and determining a third score of the current user under the identification dimension according to the statistical value corresponding to each behavior feature and the data of a plurality of behavior features of the current user in the current time period.
In an embodiment of the present invention, the score determining module 403 is configured to calculate similarity between the current behavior of the current user and current behaviors of a plurality of other users according to the statistical value corresponding to each behavior feature and data of a plurality of behavior features of the current user in the current time period; and determining a third score of the current user under the identification dimensionality according to the similarity between the current behavior of the current user and the current behaviors of a plurality of other users.
In an embodiment of the present invention, the score determining module 403 is configured to determine a third score of the current user in the identification dimension according to data of a plurality of behavior characteristics of the current user and other users in the identification dimension in the current time period; the third score is used for representing the deviation degree of the current behavior of the current user relative to the current behaviors of a plurality of other users;
and the anomaly determination module 404 is configured to determine whether the current behavior of the current user is anomalous according to the first score and the third score of the current user in the identification dimension.
In an embodiment of the present invention, the score determining module 403 is configured to determine a segment value of the current user in each feature dimension according to a behavior deviation direction of the current user; the segmentation value of the current user in the characteristic dimension is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users in the characteristic dimension; determining the slicing values of the other users in each characteristic dimension according to the behavior deviation directions of the other users; determining a fragment combination corresponding to the current user according to the fragment values of the current user and other users in each feature dimension; and determining a first score of the current user under the identification dimension according to the number of the users corresponding to the fragment combination.
In one embodiment of the invention, the number of identification dimensions is greater than 1;
an anomaly determination module 404 configured to, for each identified dimension: determining a risk score of the current user in the identification dimension according to the first score, the second score and the third score of the current user in the identification dimension; and determining whether the current behavior of the current user is abnormal or not according to the risk score corresponding to each identification dimension.
In one embodiment of the invention, the anomaly determination module 404 is configured to determine a dimension group to which each identified dimension belongs; for each dimension group: determining the risk score of the dimension group according to the risk score corresponding to each identification dimension in the dimension group; and determining whether the current behavior of the current user is abnormal or not according to the risk scores of the dimension groups.
In one embodiment of the invention, the data for a number of behavioral characteristics includes: any one or more of the amount of data sent out, the number of connection requests sent out, the number of connection targets and the communication time length.
In one embodiment of the invention, identifying dimensions includes: any one or more of TCP, UDP, ICMP, SSL, SMTP and HTTP.
An embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method of any of the embodiments described above.
Fig. 5 illustrates an exemplary system architecture 500 to which an abnormal behavior recognition method or an abnormal behavior recognition apparatus according to an embodiment of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, office applications, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 501, 502, 503. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the abnormal behavior recognition method provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the abnormal behavior recognition apparatus is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. A driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609 and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not form a limitation on the modules themselves in some cases, and for example, the sending module may also be described as a "module sending a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
for the identified dimension: determining a behavior deviation vector of a current user according to data of a plurality of behavior characteristics of the current user in a current time period and a historical time period;
determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user;
determining a first score of the current user under the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users;
and determining whether the current behavior of the current user is abnormal or not according to the first score of the current user under the identification dimension.
According to the technical scheme of the embodiment of the invention, based on the first score, whether the current behavior is abnormal or not can be determined according to the difference degree of the behavior change trends of different users; based on the second score, whether the current behavior is abnormal or not can be determined from the perspective of the change trend of the current user behavior; based on the third score, whether the current behavior is abnormal or not can be determined according to the degree of difference of the behaviors of the current user and other users in the current time period. Thus, in combination with the first score and the other scores, it is possible to identify whether the current behavior is at risk from different perspectives. Compared with the prior art, the method has higher identification accuracy and can reduce the false alarm rate.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. An abnormal behavior recognition method, comprising:
acquiring host behavior data; determining user behavior data according to the host behavior data and the binding relationship between the host and the user; the user behavior data includes: data of a number of behavioral characteristics of the user;
for the identified dimension: determining a behavior deviation vector of a current user according to data of a plurality of behavior characteristics of the current user in a current time period and a historical time period; the method comprises the following steps: for each behavioral characteristic of the current user: generating a box chart according to the data of the behavior characteristics of each historical time period, wherein the number of the historical time periods is more than 1; determining a plurality of data of non-abnormal behavior characteristics in the data of the behavior characteristics of each historical time period according to the box diagram; normalizing the data of the non-abnormal behavior characteristics in each historical time period and the data of the non-abnormal behavior characteristics in the current time period by taking a logarithm; calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the normalized behavior characteristics in each historical time period and the current time period; the behavior deviation of the current user corresponding to each behavior feature is one element of the behavior deviation vector;
determining the behavior deviation direction of the current user according to the behavior deviation vector of the current user;
determining a first score of the current user under the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; determining a second score of the current user under the identification dimension according to the behavior deviation vector of the current user; determining a third score of the current user under the identification dimension according to data of the other users and a plurality of behavior characteristics of the current user in the current time period; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users; the second score is used for representing the deviation degree of the current behavior of the current user relative to the historical behavior of the current user; the third score is used for representing the deviation degree of the current behavior of the current user relative to the current behaviors of the other users;
and determining whether the current behavior of the current user is abnormal or not according to the first score, the second score and the third score of the current user in the identification dimension.
2. The method of claim 1,
determining a third score of the current user in the identification dimension according to the data of the other users and the data of the plurality of behavior characteristics of the current user in the current time period, including:
for each behavior feature in the identification dimension: calculating the statistical value of the data of the behavior characteristics of a plurality of other users in the current time period;
and determining a third score of the current user under the identification dimensionality according to the statistical value corresponding to each behavior characteristic and data of a plurality of behavior characteristics of the current user in the current time period.
3. The method of claim 2,
determining a third score of the current user in the identification dimension according to the statistical value corresponding to each behavior feature and data of a plurality of behavior features of the current user in the current time period, including:
calculating the similarity between the current behavior of the current user and the current behaviors of the other users according to the statistical value corresponding to each behavior feature and the data of the behavior features of the current user in the current time period;
and determining a third score of the current user under the identification dimension according to the similarity between the current behavior of the current user and the current behaviors of the other users.
4. The method of claim 1, further comprising:
determining a third score of the current user in the identification dimension according to data of the other users and a plurality of behavior characteristics of the current user in the identification dimension in the current time period; wherein the third score is used to characterize a degree of deviation of the current behavior of the current user relative to the current behaviors of the number of other users;
determining whether the current behavior of the current user is abnormal according to the first score of the current user in the identification dimension includes:
and determining whether the current behavior of the current user is abnormal or not according to the first score and the third score of the current user in the identification dimension.
5. The method of claim 1,
determining a first score of the current user in the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users, including:
determining the slicing values of the current user in each characteristic dimension according to the behavior deviation direction of the current user; the segment value of the current user in the characteristic dimension is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users in the characteristic dimension;
determining the slicing values of the other users in each feature dimension according to the behavior deviation directions of the other users;
determining a sharding combination corresponding to the current user according to the sharding values of the current user and the other users in each feature dimension;
and determining a first score of the current user under the identification dimension according to the number of users corresponding to the fragment combination.
6. The method of claim 1,
the number of the identification dimensions is greater than 1;
determining whether the current behavior of the current user is abnormal according to the first score, the second score and the third score of the current user in the identification dimension includes:
for each of the identified dimensions: determining a risk score of the current user in the identification dimension according to the first score, the second score and the third score of the current user in the identification dimension;
and determining whether the current behavior of the current user is abnormal or not according to the risk score corresponding to each identification dimension.
7. The method of claim 6,
determining whether the current behavior of the current user is abnormal according to the risk score corresponding to each identification dimension includes:
determining a dimension group to which each identification dimension belongs;
for each of the dimension groups: determining the risk score of the dimension group according to the risk score corresponding to each identification dimension in the dimension group;
and determining whether the current behavior of the current user is abnormal or not according to the risk score of each dimension group.
8. The method of claim 1,
data of the plurality of behavioral characteristics, including: any one or more of the amount of data sent out, the number of connection requests sent out, the number of connection targets and the communication time length.
9. The method of claim 1,
the identifying dimensions include: any one or more of a transmission control protocol TCP, a user datagram protocol UDP, an Internet control message protocol ICMP, a secure socket protocol SSL, a simple mail transfer protocol SMTP and a hypertext transfer protocol HTTP.
10. An abnormal behavior recognition apparatus, comprising:
the deviation determining module is used for acquiring host behavior data; determining user behavior data according to the host behavior data and the binding relationship between the host and the user; the user behavior data includes: data of a number of behavioral characteristics of the user; configured to, for an identification dimension: determining a behavior deviation vector of a current user according to data of a plurality of behavior characteristics of the current user in a current time period and a historical time period; the method comprises the following steps: for each behavioral characteristic of the current user: generating a box chart according to the data of the behavior characteristics of each historical time period, wherein the number of the historical time periods is more than 1; determining a plurality of data of non-abnormal behavior characteristics in the data of the behavior characteristics of each historical time period according to the box diagram; normalizing the data of the non-abnormal behavior characteristics in each historical time period and the data of the non-abnormal behavior characteristics in the current time period by taking a logarithm; calculating the behavior deviation of the current user corresponding to the behavior characteristics according to the data of the normalized behavior characteristics in each historical time period and the current time period; the behavior deviation of the current user corresponding to each behavior feature is one element of the behavior deviation vector;
the direction determining module is configured to determine a behavior deviation direction of the current user according to the behavior deviation vector of the current user;
the score determining module is configured to determine a first score of the current user in the identification dimension according to the behavior deviation direction of the current user and the behavior deviation directions of other users; determining a second score of the current user under the identification dimension according to the behavior deviation vector of the current user; determining a third score of the current user under the identification dimension according to data of the other users and a plurality of behavior characteristics of the current user in the current time period; the first score is used for representing the deviation degree of the behavior deviation direction of the current user relative to the behavior deviation directions of other users; the second score is used for representing the deviation degree of the current behavior of the current user relative to the historical behavior of the current user; the third score is used for representing the deviation degree of the current behavior of the current user relative to the current behaviors of the other users;
and the abnormity determining module is configured to determine whether the current behavior of the current user is abnormal according to the first score, the second score and the third score of the current user in the identification dimension.
11. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.
CN202010630842.0A 2020-07-03 2020-07-03 Abnormal behavior identification method and device Active CN111865941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010630842.0A CN111865941B (en) 2020-07-03 2020-07-03 Abnormal behavior identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010630842.0A CN111865941B (en) 2020-07-03 2020-07-03 Abnormal behavior identification method and device

Publications (2)

Publication Number Publication Date
CN111865941A CN111865941A (en) 2020-10-30
CN111865941B true CN111865941B (en) 2022-12-27

Family

ID=73151959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010630842.0A Active CN111865941B (en) 2020-07-03 2020-07-03 Abnormal behavior identification method and device

Country Status (1)

Country Link
CN (1) CN111865941B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11706241B1 (en) * 2020-04-08 2023-07-18 Wells Fargo Bank, N.A. Security model utilizing multi-channel data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6671811B1 (en) * 1999-10-25 2003-12-30 Visa Internation Service Association Features generation for use in computer network intrusion detection
US9154516B1 (en) * 2013-09-27 2015-10-06 Emc Corporation Detecting risky network communications based on evaluation using normal and abnormal behavior profiles
CN110706026A (en) * 2019-09-25 2020-01-17 精硕科技(北京)股份有限公司 Abnormal user identification method, identification device and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10015182B1 (en) * 2016-06-30 2018-07-03 Symantec Corporation Systems and methods for protecting computing resources
CN109120629B (en) * 2018-08-31 2021-07-30 新华三信息安全技术有限公司 Abnormal user identification method and device
CN110138763B (en) * 2019-05-09 2020-12-11 中国科学院信息工程研究所 Internal threat detection system and method based on dynamic web browsing behavior
CN110351307B (en) * 2019-08-14 2022-01-28 杭州安恒信息技术股份有限公司 Abnormal user detection method and system based on ensemble learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6671811B1 (en) * 1999-10-25 2003-12-30 Visa Internation Service Association Features generation for use in computer network intrusion detection
US9154516B1 (en) * 2013-09-27 2015-10-06 Emc Corporation Detecting risky network communications based on evaluation using normal and abnormal behavior profiles
CN110706026A (en) * 2019-09-25 2020-01-17 精硕科技(北京)股份有限公司 Abnormal user identification method, identification device and readable storage medium

Also Published As

Publication number Publication date
CN111865941A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
US11750659B2 (en) Cybersecurity profiling and rating using active and passive external reconnaissance
US20200389495A1 (en) Secure policy-controlled processing and auditing on regulated data sets
US20220164731A1 (en) Systems and methods for monitoring information security effectiveness
US20200412767A1 (en) Hybrid system for the protection and secure data transportation of convergent operational technology and informational technology networks
US10791137B2 (en) Risk assessment and remediation
US11425148B2 (en) Identifying malicious network devices
US20180033009A1 (en) Method and system for facilitating the identification and prevention of potentially fraudulent activity in a financial system
US9154516B1 (en) Detecting risky network communications based on evaluation using normal and abnormal behavior profiles
US20180033089A1 (en) Method and system for identifying and addressing potential account takeover activity in a financial system
US11886598B2 (en) System and method for scalable cyber-risk assessment of computer systems
US9674212B2 (en) Social network data removal
US20150229666A1 (en) Social network profile data removal
US20180033006A1 (en) Method and system for identifying and addressing potential fictitious business entity-based fraud
US20180248879A1 (en) Method and apparatus for setting access privilege, server and storage medium
US20220014561A1 (en) System and methods for automated internet-scale web application vulnerability scanning and enhanced security profiling
US11394722B2 (en) Social media rule engine
US11582251B2 (en) Identifying patterns in computing attacks through an automated traffic variance finder
CN109685536B (en) Method and apparatus for outputting information
US11165801B2 (en) Social threat correlation
US10158657B1 (en) Rating IP addresses based on interactions between users and an online service
US20210360017A1 (en) System and method of dynamic cyber risk assessment
CN113162923B (en) User reliability evaluation method and device based on user behaviors and storage medium
US11165804B2 (en) Distinguishing bot traffic from human traffic
CN111865941B (en) Abnormal behavior identification method and device
WO2023192175A1 (en) Device-agnostic access control techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210319

Address after: 100176 8660, 6 / F, building 3, No.3, Yongchang North Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: BEIJING SKYGUARD NETWORK SECURITY TECHNOLOGY Co.,Ltd.

Applicant after: Chengdu sky guard Network Security Technology Co.,Ltd.

Address before: 100176 8660, 6 / F, building 3, No.3, Yongchang North Road, Beijing Economic and Technological Development Zone, Beijing

Applicant before: BEIJING SKYGUARD NETWORK SECURITY TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant