CN114553497B - Internal threat detection method based on feature fusion - Google Patents

Internal threat detection method based on feature fusion Download PDF

Info

Publication number
CN114553497B
CN114553497B CN202210105573.5A CN202210105573A CN114553497B CN 114553497 B CN114553497 B CN 114553497B CN 202210105573 A CN202210105573 A CN 202210105573A CN 114553497 B CN114553497 B CN 114553497B
Authority
CN
China
Prior art keywords
user
user behavior
node
behavior
internal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210105573.5A
Other languages
Chinese (zh)
Other versions
CN114553497A (en
Inventor
卢志刚
肖海涛
刘玉岭
张辰
刘松
姜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202210105573.5A priority Critical patent/CN114553497B/en
Publication of CN114553497A publication Critical patent/CN114553497A/en
Application granted granted Critical
Publication of CN114553497B publication Critical patent/CN114553497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a feature fusion-based internal threat detection method, which relates to the field of network space security.

Description

Internal threat detection method based on feature fusion
Technical Field
The invention relates to the field of network space security, which fuses statistical characteristics and structural characteristics of users, detects internal threats by using an anomaly detection method based on deep learning, and discovers potential threat users in an intranet in time.
Background
Under the condition that the external network threat is rampant day by day, the influence brought by the internal threat is not ignored at the same time, the internal threat generally refers to a user who has access authority to the internal network, system or data in an organization, abuse authority, violates the security policy of the organization, and has a negative influence on confidentiality, integrity and usability of internal information. According to the 2020 safety report from Cybersecurity instruments, two thirds of the organizations (68%) indicate that threats from Insiders have become more frequent in the past year, and 70% of the organizations have suffered at least once from malicious behavior by Insiders in the past year. The threat brought by the current internal network is gradually highlighted, becomes a problem to be solved urgently at present, and how to accurately and timely detect the internal threat is crucial to the stable operation and the healthy development of the organization.
The internal threat is often associated with malicious employees who intentionally implement data theft, system destruction and the like to cause losses to enterprises and organizations, and actually, negligence of employees and errors of partners also contribute to a lot of security holes and unexpected data leakage to cause small losses to enterprises and organizations. Except objective factors such as system loopholes and improper authority distribution, people are main factors causing losses of enterprises and organizations, internal threats are usually implemented by privileged users with legal authorities, and are different from behaviors of unauthorized operation of system loopholes applied by external threats, the internal users have legal identities and are familiar to internal architectures, so that malicious behaviors are difficult to discover, and huge threats are caused to the safety of the organizations.
Today's research on internal threat detection techniques can be divided into rule-based internal threat detection, traditional machine learning-based internal threat detection, and deep learning-based internal threat detection, depending on the method used.
The rule-based internal threat detection generally has higher accuracy and lower false alarm rate on the determined threat behaviors, but is difficult to detect unknown abnormal behaviors which are not in a knowledge base, cannot adapt to new internal attack behaviors, and is not suitable for the current complex network environment.
Internal threat detection based on traditional machine learning generally preprocesses data, extracts features and selects the features, trains training data by using a selected traditional machine learning algorithm, predicts a model obtained by training by using test data, and evaluates a prediction result and a true value.
The internal threat detection based on deep learning mostly takes a user behavior sequence as data input, a normal behavior model of a user is established, the user behavior abnormity is detected through the change of the user behavior sequence, whether the user is abnormal or not is judged, only unilateral user behavior sequence information or unilateral user behavior statistical information can be learned, and judgment on correlation information between the user attribute and the user is lacked.
In summary, in the field of internal threat detection, the problems of low detection accuracy and high false alarm rate due to incomplete utilization of feature types and no consideration of associated information among users generally exist, so that the effect of internal threat detection is often not ideal enough.
Disclosure of Invention
In order to solve the problems, the invention provides an internal threat detection method based on feature fusion, which is used for extracting statistical features and structural features of user behaviors from multi-source logs respectively, and combining an anomaly detection method based on deep learning to realize detection of internal threats and improve the security of an internal network organization.
In order to achieve the purpose, the invention adopts the specific technical scheme that:
an internal threat detection method based on feature fusion comprises the following steps:
collecting multi-source user behavior logs in an internal network, analyzing by taking users as units, and forming an independent multi-source user behavior log record for each user;
counting behavior information of users from a multi-source user behavior log record corresponding to each user, and extracting statistical characteristics of user behaviors;
constructing a user login behavior association graph by using login logs in multi-source user behavior log records of a user, randomly walking neighbor nodes of each user node to generate a plurality of random walking sequences with fixed lengths, wherein each sequence is formed by arranging the walking nodes in a front-back sequence, and the structural characteristics of the user behavior are extracted aiming at each node in the sequence;
fusing the extracted statistical characteristics and structural characteristics of the user behaviors to form a characteristic matrix;
processing training data containing internal threat labels through the steps to obtain a feature matrix, inputting the feature matrix into a Capsule neural network for training to obtain an internal threat detection model based on feature fusion; when the threat in the internal network is formally detected, the multi-source user behavior log in the internal network is obtained, the characteristic matrix is obtained through the processing of the steps, and the characteristic matrix is input into the internal threat detection model to detect the threat in the internal network.
Further, the multi-source user behavior log includes a mobile device usage record, a file operation log, an email log, and a web browsing record in addition to a login log.
Further, the behavior information of the user comprises frequency-based user behavior statistical characteristics and content-based user behavior statistical characteristics; the frequency-based user behavior statistical characteristics are average counts of different types of user behaviors, and the content-based user behavior statistical characteristics are content generated based on user behavior operation.
Further, the frequency-based user behavior statistical characteristics comprise one or more of user login and logout times, user login and logout times at the next shift time, the number of computers logged in by a user, the number of equipment connections, the number of different file transmission times, the total number of file transmission, the number of files transmitted at the next shift time, the number of executable file transmission times, the number of computers involved in file transmission, the number of electronic mails sent out, the number of electronic mails sent to the inside of an organization, the number of electronic mails sent to the outside of the organization, the average size of the electronic mails, the number of electronic mail attachments, the number of electronic mail receivers, the number of electronic mails sent at the next shift time, the number of computers used for receiving the mails, the number of web page views and the number of web page views at the next shift time;
the content-based user behavior statistical characteristics comprise one or more of the number of e-mail associated with emotional tendency, the number of web pages browsed associated with decryption, the number of web pages browsed associated with job recruitment, the number of web pages browsed associated with hacker, the number of web pages browsed associated with cloud storage, the number of web pages browsed associated with social contact, and the number of web pages browsed associated with emotional tendency.
Further, the user login behavior association graph is a same composition, is composed of different users and association relations between the different users, and is denoted by G = (V, E), where V represents a set of nodes in the graph, and each node represents one user; and E represents a set of edges in the graph, and each edge represents the association relationship between two corresponding users.
Further, the method for extracting the structural features of the user behavior for each node in the sequence comprises the following steps: generating a plurality of random walk sequences with fixed length to form a node sequence set S; and (3) for each node u, calculating a loss function f maximized on the set S through Skip-gram model learning to obtain a node embedding vector, wherein the node embedding vector is the structural characteristic of the user behavior.
Further, the method for fusing the statistical characteristics and the structural characteristics of the user behaviors comprises the following steps: and splicing the statistical characteristics and the structural characteristics of the user behaviors, normalizing the characteristic values to be in a range of 0-1 by using Min-Max normalization, and converting into a two-dimensional characteristic matrix.
Further, during the training process in the Capsule neural network, accumulating the reconstruction error loss and the slowness limit loss to obtain a final loss function, and using the final loss function to adjust the parameters of the Capsule neural network to obtain the internal threat detection model; the reconstruction error loss is the Euclidean distance between the characteristic matrix and the output of a Sigmoid layer in reconstruction, the output of the Sigmoid layer in reconstruction reconstructs a digital decoding structure from a Digitcaps digital Capsule layer of a Capsule neural network, a correct activation vector of a digital Capsule is reserved by a masking method, and then the activation vector is reconstructed to obtain the reconstruction error loss.
Compared with the existing internal threat detection method, the method has the following advantages:
the method and the device respectively excavate potential characteristics of the user behaviors from a statistical level and a structural level of the multi-source user behavior log, effectively utilize various characteristic types and combine the association information between the users, thereby more effectively detecting the internal threats, solving the problems that the original internal threat detection method only can learn unilateral user behavior information and does not consider the association between the users, and further improving the accuracy of the internal threat detection.
Drawings
Fig. 1 is a general flow chart of an internal threat detection method based on feature fusion according to an embodiment of the present invention.
Fig. 2 is a schematic diagram illustrating the extraction of statistical features of user behaviors according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating extraction of structural features of user behaviors according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features and advantages of the present invention more obvious and understandable by those skilled in the art, the technical cores of the present invention are further described in detail with reference to the accompanying drawings and examples.
The embodiment of the invention provides an internal threat detection method based on feature fusion, which is specifically explained as follows:
as shown in fig. 1, is a general flow chart of internal threat detection based on feature fusion.The method comprises 5 steps, wherein the first step is to collect and analyze a log R of a user L Mobile device usage record R D File operation log R F E-mail journal R M And a web browsing record R W Forming a multi-source user behavior log R, and respectively extracting statistical characteristics F of user behaviors from the multi-source user behavior log R obtained by analysis stat Structural features F related to user behavior stru Then, the obtained two types of features are fused and converted, and each internal user forms a unique feature matrix F matrix And finally, inputting the data into a Capsule neural network for training to obtain an internal threat detection model M based on feature fusion, and detecting the internal threat user by using the model M.
As shown in fig. 2, the analyzed multi-source user behavior logs are subjected to statistics of user behaviors from two aspects of frequency-based and content-based, the frequency of user behavior operation often reflects typical characteristics of user behaviors, and the user behavior statistical characteristics based on the frequency are average counts of different types of user behaviors, such as daily user login and logout times, daily user login and logout times in the off-duty time, and the like; on the other hand, the statistical characteristics of the user behavior based on the content are extracted based on the content generated by the behavior operation, and are mainly text characteristics generated in user communication, such as emotional tendency in an email, the access frequency of a specific webpage and the like. Here, a total of 33 statistical features of user behavior are extracted.
Specifically, the user behavior statistical characteristics are extracted by taking a user as a unit, and the frequency-based user behavior statistical characteristics comprise user login and logout times, user login and logout times in the off-duty time and the number of computers logged in by the user; the connection times of the equipment, the connection times of the equipment during the off-duty time and the number of computers connected with the equipment; the transmission times of different files, the total transmission number of the files, the number of files transmitted in the off-duty time, the transmission number of executable files and the number of computers related to the file transmission; the number of sent e-mails, the number of e-mails sent to the inside of the organization, the number of e-mails sent to the outside of the organization, the average size of the e-mails, the number of e-mail attachments, the number of e-mail receivers, the number of sent e-mails during the off-duty time, and the number of computers used for receiving the e-mails; the number of web page browses and the number of web page browses at the next work time. The content-based user behavior statistical characteristics comprise the number of e-mail pieces relevant to emotional tendency; the number of web pages viewed related to decryption, the number of web pages viewed related to job recruitment, the number of web pages viewed related to hackers, the number of web pages viewed related to cloud storage, the number of web pages viewed related to social interaction, and the number of web pages viewed related to emotional tendency.
As shown in fig. 3, a user login behavior association graph G = (V, E) is constructed according to a login log of a user, the constructed user login behavior association graph is a same graph, where V represents a set of nodes in the graph, E represents a set of edges in the graph, and V has only one attribute, that is, a user, and E has only one attribute, that is, whether there is an association relationship between users. Suppose a user i With user j When the same equipment is logged in, the user uses i With user j There is an associative relationship between them. And constructing a user login behavior association diagram. Then, a random walk sequence S with a fixed length is simulated by using a random walk method, surrounding neighbor nodes are walked, the length of the random walk is set to be l for any node u epsilon V in the graph G, and therefore a node sequence set S = { S } is generated 1 ,s 2 ,…,s m In which s is i And representing the ith random walk sequence, wherein m represents the number of all the random walk sequences, and then, for each node u, learning through a Skip-gram model to obtain a d-dimensional node vector representation. The Skip-gram model learns node embedding vectors by maximizing a loss function f on a node sequence set S, wherein the specific loss function f is shown as formula (1):
f=∑ u∈W logP(N(u)|u) (1)
wherein, W is a word list containing each independent node, N (u) is a neighbor node set of the node u, and the base number of log is 2; p (N (u) | u) is the transition probability of a given source node u to u's neighbor nodes, which is defined as shown in equation (2):
Figure BDA0003493820330000051
wherein v and v' are two vector representations of the node u, and v is a low-dimensional embedded vector of the final node u, and user nodes with similar device login behaviors often have similar low-dimensional embedded vectors. T denotes a transposed matrix, n i I as the neighbor nodes of node u, W is the vocabulary of the independent node, W is the vocabulary containing each independent node, N is the set of neighbor nodes of the node,
Figure BDA0003493820330000052
two vector representations of node u, respectively.
In the process of feature fusion, firstly, the statistical features F of the user behaviors are obtained by extraction stat Structural features F related to user behavior stru Splicing, normalizing the feature value to be in the range of 0-1 by using Min-Max normalization, and eliminating the dimension difference among different features by using Min-Max normalization so as to avoid the great influence of the features of different dimensions on the model, wherein the features are converted into a two-dimensional feature matrix F after being normalized matrix For input into a subsequent deep neural network model.
In the neural network training stage, the labeled training data is subjected to the preprocessing to obtain a feature matrix F matrix Inputting the abnormal data into a Capsule neural network, and outputting whether the user corresponding to the characteristic matrix is abnormal or not after respectively passing through a Conv convolution layer, a Primary caps main Capsule layer and a Digitcaps digital Capsule layer. Compared with the traditional neural network, the Capsule neural network has stronger learning and characterization capabilities and can better adapt to the learned features.
In the training process, in order to adjust parameters of the Capsule neural network, the embodiment of the invention uses a slow limiting Loss (Margin Loss) function, and for each Capsule k, a Loss function L k As shown in equation (3):
L k =T k max(0,m + -|v k |) 2 +λ(1-T k )max(0,|v k |-m - ) 2 (3)
wherein v is k Is the output vector of the capsule neural network; | v k L is the modular length of the output vector and can represent the size of class probability; when class k is the correct classification, T k =1, otherwise T k =0。m + And m - The two hyperparameters control the upper and lower bounds of the right classification and wrong classification losses, respectively, e.g. if the probability of correct classification is greater than m + When this is the case, then the loss of correct classification is 0, where m + And m - Set to 0.9 and 0.1, respectively, and λ is the regularization parameter, here set to 0.5.
Meanwhile, the embodiment of the invention accumulates the reconstruction error loss and the slow limit loss as a final loss function to obtain a more accurate detection model, namely the reconstruction error loss L R The method is characterized in that a digital decoding structure is reconstructed from Digitcaps digital capsule layers, only the activation vector of a correct digital capsule is reserved by using a masking method, then the activation vector is used for reconstruction, and the reconstruction error loss L is reduced R As input feature matrix F matrix And the euclidean distance between the outputs of the Sigmoid layers in the reconstruction. The Sigmoid layer is an additional layer for performing reconstruction error calculation, and is used for comparing errors of a matrix reconstructed by the output vector with an input matrix. The final loss function L is shown in equation (4):
Figure BDA0003493820330000061
wherein, C k Is the number of categories (i.e. 2 categories of malicious and non-malicious users), here C k =2,L k For the loss of slack for each capsule k, L R For reconstruction errors, the neural network is trained with the minimum error function L as a target.
After training is finished, an internal threat detection model based on feature fusion can be obtained, the model is formally used for detecting internal threats, an obtained feature matrix is input into the model for detection during detection, and a detection result is output.
The internal threat detection model is evaluated as follows, wherein the data set is derived from an internal threat test data set issued by the computer security emergency response group of the university of tomilong in the card, and the data set respectively collects the log of 1000 users of 17 months, the mobile device usage record, the file operation log, the email log and the web browsing record, wherein 70 internal threat users are contained.
The training of the model mainly learns the parameters of the Capsule neural network through a training data set, and when the loss function obtains the minimum value, the parameters of the Capsule neural network are the optimal parameters, so that the internal threat detection model based on feature fusion is obtained.
In the evaluation of the model, four indexes of Accuracy, F-measure, AUC and Recall are adopted for evaluation, and the Accuracy of the Accuracy visually reflects the performance of the model; the F-measure is a harmonic mean of the accuracy rate and the recall rate, the accuracy rate and the recall rate are usually a pair of contradictory measures, and generally, the higher the accuracy rate is, the lower the recall rate is; when the recall rate is high, the accuracy rate is often low. The F-measure value balances the two values, if the obtained values of the accuracy rate and the recall rate are higher and the other value is lower, the final F-measure value is also lower, and the F-measure value is higher only when the values of the two values are higher simultaneously, so that the occurrence of extreme conditions is avoided, and the model can be better and accurately evaluated; AUC is the area under the receiver operating characteristic curve and is commonly used for evaluating the effect of the two-classification model; recall is a Recall rate, which refers to the ratio of the number of correctly predicted positive samples to the total number of real positive samples, and can reflect the detection capability of the model on internal threat users.
In the evaluation experiment, a data set is divided into a training set and a test set according to the proportion of 6: 0.980, 0.958, 0.933 and 0.867 which are all higher than the internal threat detection methods based on machine learning such as Logistic regression, SVM support vector machine, random Forest and the like and deep learning such as CNN convolution neural network, GCN graph convolution neural network and the like, the effectiveness of the internal threat detection method based on feature fusion in the internal threat detection field is proved.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail by using examples, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered in the claims of the present invention.

Claims (8)

1. An internal threat detection method based on feature fusion is characterized by comprising the following steps:
collecting multi-source user behavior logs in an internal network, analyzing by taking a user as a unit, and forming an independent multi-source user behavior log record for each user;
counting behavior information of users from a multi-source user behavior log record corresponding to each user, and extracting statistical characteristics of user behaviors;
constructing a user login behavior association graph by using login logs in a multi-source user behavior log record of a user, randomly walking neighbor nodes of each user node to generate a plurality of random walk sequences with fixed lengths, wherein each sequence is formed by arranging the walk nodes in a front-back sequence, and the structural characteristics of the user behavior are extracted aiming at each node in the sequence;
fusing the extracted statistical characteristics of the user behaviors with the structural characteristics to form a characteristic matrix;
processing the training data containing the internal threat labels through the steps to obtain a feature matrix, inputting the feature matrix into a Capsule neural network for training to obtain an internal threat detection model based on feature fusion; when the threat in the internal network is formally detected, a multi-source user behavior log in the internal network is obtained, a feature matrix is obtained through the steps, and the feature matrix is input into the internal threat detection model to detect the threat in the internal network.
2. The method of claim 1, wherein the multi-source user behavior log comprises mobile device usage records, file operation logs, email logs, and web browsing records in addition to a log of logins.
3. The method of claim 1, wherein the behavior information of the user comprises frequency-based user behavior statistics and content-based user behavior statistics; the frequency-based user behavior statistical characteristics are average counts of different types of user behaviors, and the content-based user behavior statistical characteristics are content generated based on user behavior operation.
4. The method of claim 3, wherein the frequency-based statistical user behavior characteristics include one or more of a number of user logins and logouts, a number of user logins and logouts at a time of work, a number of computers that the user logs in, a number of connections of the device at a time of work, a number of computers that the device is connected to, a number of transmissions of different files, a total number of file transmissions, a number of files transmitted at a time of work, a number of executable file transmissions, a number of computers involved in file transmissions, a number of outgoing e-mail messages, a number of e-mail messages sent to the interior of the organization, a number of e-mail messages sent to the exterior of the organization, an average size of e-mail messages, a number of e-mail recipients, a number of e-mail senders, a number of computers used to receive e-mails, a number of web pages viewed, and a number of web pages viewed at a time of work;
the content-based user behavior statistical characteristics comprise one or more of the number of e-mail associated with emotional tendency, the number of web pages browsed associated with decryption, the number of web pages browsed associated with job recruitment, the number of web pages browsed associated with hacker, the number of web pages browsed associated with cloud storage, the number of web pages browsed associated with social contact, and the number of web pages browsed associated with emotional tendency.
5. The method of claim 1, wherein the user login behavior association graph is a isomorphic graph, and is composed of different users and associations between different users, and is denoted as G = (V, E), where V represents a set of nodes in the graph, and each node represents a user; and E represents a set of edges in the graph, each edge represents an association relationship existing between two corresponding users, and the association relationship comprises the fact that the two users log in the same device.
6. The method of claim 1, wherein the method of extracting structural features of user behavior for each node in the sequence is: generating a plurality of random walk sequences with fixed length to form a node sequence set S; and (3) for each node u, calculating a loss function f maximized on the set S through Skip-gram model learning to obtain a node embedding vector, wherein the node embedding vector is the structural characteristic of the user behavior.
7. The method of claim 1, wherein the statistical and structural features of user behavior are fused by: and splicing the statistical characteristics and the structural characteristics of the user behaviors, normalizing the characteristic values to be in a range of 0-1 by using Min-Max normalization, and converting the characteristic values into a two-dimensional characteristic matrix.
8. The method of claim 1, wherein during training in the Capsule neural network, the reconstruction error loss and the slowness loss are accumulated to obtain a final loss function, and the final loss function is used to adjust parameters of the Capsule neural network to obtain the internal threat detection model; the reconstruction error loss is the Euclidean distance between the characteristic matrix and the output of a Sigmoid layer in reconstruction, the output of the Sigmoid layer in reconstruction reconstructs a digital decoding structure from a Digitcaps digital Capsule layer of a Capsule neural network, a correct activation vector of a digital Capsule is reserved by a masking method, and then the activation vector is reconstructed to obtain the reconstruction error loss.
CN202210105573.5A 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion Active CN114553497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210105573.5A CN114553497B (en) 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210105573.5A CN114553497B (en) 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion

Publications (2)

Publication Number Publication Date
CN114553497A CN114553497A (en) 2022-05-27
CN114553497B true CN114553497B (en) 2022-11-15

Family

ID=81673865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210105573.5A Active CN114553497B (en) 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion

Country Status (1)

Country Link
CN (1) CN114553497B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599295A (en) * 2016-12-27 2017-04-26 四川中电启明星信息技术有限公司 Multi-track visual analyzing evidence-collecting method for user behaviors and system
CN107426231A (en) * 2017-08-03 2017-12-01 北京奇安信科技有限公司 A kind of method and device for identifying user behavior
CN111163057A (en) * 2019-12-09 2020-05-15 中国科学院信息工程研究所 User identification system and method based on heterogeneous information network embedding algorithm
US10685293B1 (en) * 2017-01-20 2020-06-16 Cybraics, Inc. Methods and systems for analyzing cybersecurity threats
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
WO2020227429A1 (en) * 2019-05-06 2020-11-12 Strong Force Iot Portfolio 2016, Llc Platform for facilitating development of intelligence in an industrial internet of things system
CN112184468A (en) * 2020-09-29 2021-01-05 中国电子科技集团公司电子科学研究院 Dynamic social relationship network link prediction method and device based on spatio-temporal relationship
CN113919239A (en) * 2021-12-15 2022-01-11 军事科学院系统工程研究院网络信息研究所 Intelligent internal threat detection method and system based on space-time feature fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210014256A1 (en) * 2019-07-08 2021-01-14 Fmr Llc Automated intelligent detection and mitigation of cyber security threats

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599295A (en) * 2016-12-27 2017-04-26 四川中电启明星信息技术有限公司 Multi-track visual analyzing evidence-collecting method for user behaviors and system
US10685293B1 (en) * 2017-01-20 2020-06-16 Cybraics, Inc. Methods and systems for analyzing cybersecurity threats
CN107426231A (en) * 2017-08-03 2017-12-01 北京奇安信科技有限公司 A kind of method and device for identifying user behavior
WO2020227429A1 (en) * 2019-05-06 2020-11-12 Strong Force Iot Portfolio 2016, Llc Platform for facilitating development of intelligence in an industrial internet of things system
CN111163057A (en) * 2019-12-09 2020-05-15 中国科学院信息工程研究所 User identification system and method based on heterogeneous information network embedding algorithm
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
CN112184468A (en) * 2020-09-29 2021-01-05 中国电子科技集团公司电子科学研究院 Dynamic social relationship network link prediction method and device based on spatio-temporal relationship
CN113919239A (en) * 2021-12-15 2022-01-11 军事科学院系统工程研究院网络信息研究所 Intelligent internal threat detection method and system based on space-time feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Liya Su ; Yepeng Yao ; Zhigang Lu ; Baoxu Liu.Understanding the Influence of Graph Kernels on Deep Learning Architecture: A Case Study of Flow-Based Network Attack Detection.《2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)》.2019, *
Xueying Han ; Rongchao Yin ; Zhigang Lu ; Bo Jiang ; Yuling Liu.STIDM: A Spatial and Temporal Aware Intrusion Detection Model.《2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)》.2021, *
基于多维度数据分析的移动威胁感知平台建设;计晨晓等;《中国新通信》;20161220(第24期);全文 *
数据融合的协同网络入侵检测;张巍等;《计算机应用》;20090101(第01期);全文 *

Also Published As

Publication number Publication date
CN114553497A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
Ahmed et al. Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges
Cai et al. Structural temporal graph neural networks for anomaly detection in dynamic graphs
Li et al. Data fusion for network intrusion detection: a review
US11985142B2 (en) Method and system for determining and acting on a structured document cyber threat risk
Le et al. Exploring anomalous behaviour detection and classification for insider threat identification
US20230007042A1 (en) A method and system for determining and acting on an email cyber threat campaign
Yeruva et al. Anomaly Detection System using ML Classification Algorithm for Network Security
Jacobs et al. Enhancing Vulnerability prioritization: Data-driven exploit predictions with community-driven insights
Lota et al. A systematic literature review on sms spam detection techniques
Zhao et al. A survey of deep anomaly detection for system logs
Riera et al. Prevention and fighting against web attacks through anomaly detection technology. A systematic review
Li et al. Image‐Based Insider Threat Detection via Geometric Transformation
Golczynski et al. End-to-end anomaly detection for identifying malicious cyber behavior through NLP-based log embeddings
de Riberolles et al. Anomaly detection for ICS based on deep learning: a use case for aeronautical radar data
Phan et al. User identification via neural network based language models
Mvula et al. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
Meena Siwach Anomaly detection for web log data analysis: A review
Torre et al. Deep learning techniques to detect cybersecurity attacks: a systematic mapping study
Mujtaba et al. Detection of suspicious terrorist emails using text classification: A review
CN114553497B (en) Internal threat detection method based on feature fusion
de la Torre-Abaitua et al. A compression based framework for the detection of anomalies in heterogeneous data sources
Brown et al. Simple and efficient identification of personally identifiable information on a public website
Dong et al. Security situation assessment algorithm for industrial control network nodes based on improved text simhash
Arikkat et al. Can Twitter be used to Acquire Reliable Alerts against Novel Cyber Attacks?
Javadi-Moghaddam et al. Detecting phishing pages using the relief feature selection and multiple classifiers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant