CN114553497A - Internal threat detection method based on feature fusion - Google Patents

Internal threat detection method based on feature fusion Download PDF

Info

Publication number
CN114553497A
CN114553497A CN202210105573.5A CN202210105573A CN114553497A CN 114553497 A CN114553497 A CN 114553497A CN 202210105573 A CN202210105573 A CN 202210105573A CN 114553497 A CN114553497 A CN 114553497A
Authority
CN
China
Prior art keywords
user
user behavior
node
internal
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210105573.5A
Other languages
Chinese (zh)
Other versions
CN114553497B (en
Inventor
卢志刚
肖海涛
刘玉岭
张辰
刘松
姜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202210105573.5A priority Critical patent/CN114553497B/en
Publication of CN114553497A publication Critical patent/CN114553497A/en
Application granted granted Critical
Publication of CN114553497B publication Critical patent/CN114553497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a feature fusion-based internal threat detection method, which relates to the field of network space security.

Description

Internal threat detection method based on feature fusion
Technical Field
The invention relates to the field of network space security, which fuses statistical characteristics and structural characteristics of users, detects internal threats by using an anomaly detection method based on deep learning, and discovers potential threat users in an intranet in time.
Background
Under the condition that the external network threat is rampant day by day, the influence brought by the internal threat is not ignored at the same time, the internal threat generally refers to a user who has access authority to the internal network, system or data in an organization, abuse authority, violates the security policy of the organization, and has a negative influence on confidentiality, integrity and usability of internal information. According to the 2020 safety report from Cybersecurity instruments, two thirds of the organizations (68%) indicate that threats from Insiders have become more frequent in the past year, and 70% of the organizations have suffered at least once from malicious behavior by Insiders in the past year. The threat brought by the current internal network is highlighted day by day, and becomes a problem to be solved urgently at present, and how to accurately and timely detect the internal threat is crucial to the stable operation and the healthy development of the organization.
The internal threat is often associated with malicious employees who intentionally implement data theft, system destruction and the like to cause losses to enterprises and organizations, and actually, negligence of employees and errors of partners also contribute to a lot of security holes and unexpected data leakage to cause small losses to enterprises and organizations. Except for objective factors such as system loopholes and improper authority distribution, people are main factors causing enterprise and organization losses, internal threats are usually implemented by privileged users with legal authorities, and are different from behaviors of unauthorized operation of external threat application system loopholes, the internal users have legal identities and are familiar with internal architectures, so that malicious behaviors are difficult to discover, and huge threats are caused to the safety of organizations.
Today's research on internal threat detection techniques can be divided into rule-based internal threat detection, traditional machine learning-based internal threat detection, and deep learning-based internal threat detection, depending on the method used.
The rule-based internal threat detection generally has higher accuracy and lower false alarm rate on the determined threat behaviors, but is difficult to detect unknown abnormal behaviors which are not in a knowledge base, cannot adapt to new internal attack behaviors, and is not suitable for the current complex network environment.
Internal threat detection based on traditional machine learning generally preprocesses data, extracts features and selects the features, trains training data by using a selected traditional machine learning algorithm, predicts a model obtained by training by using test data, and evaluates a prediction result and a true value.
The internal threat detection based on deep learning mostly takes a user behavior sequence as data input, a normal behavior model of a user is established, the abnormality of the user behavior is detected through the change of the user behavior sequence, whether the user is abnormal or not is judged, only unilateral user behavior sequence information or unilateral user behavior statistical information can be learned, and the judgment of the correlation information between the user attribute and the user is lacked.
In summary, in the field of internal threat detection, the problems of low detection accuracy and high false alarm rate due to incomplete utilization of feature types and no consideration of associated information among users generally exist, so that the effect of internal threat detection is often not ideal enough.
Disclosure of Invention
In order to solve the problems, the invention provides an internal threat detection method based on feature fusion, which is used for extracting statistical features and structural features of user behaviors from multi-source logs respectively, and combining an anomaly detection method based on deep learning to realize detection of internal threats and improve the security of an internal network organization.
In order to achieve the purpose, the invention adopts the specific technical scheme that:
an internal threat detection method based on feature fusion comprises the following steps:
collecting multi-source user behavior logs in an internal network, analyzing by taking users as units, and forming an independent multi-source user behavior log record for each user;
counting behavior information of the users from the multi-source user behavior log record corresponding to each user, and extracting statistical characteristics of user behaviors;
constructing a user login behavior association graph by using login logs in multi-source user behavior log records of a user, randomly walking neighbor nodes of each user node to generate a plurality of random walking sequences with fixed lengths, wherein each sequence is formed by arranging the walking nodes in a front-back sequence, and the structural characteristics of the user behavior are extracted aiming at each node in the sequence;
fusing the extracted statistical characteristics of the user behaviors with the structural characteristics to form a characteristic matrix;
processing training data containing internal threat labels through the steps to obtain a feature matrix, inputting the feature matrix into a Capsule neural network for training to obtain an internal threat detection model based on feature fusion; when the threat in the internal network is formally detected, the multi-source user behavior log in the internal network is obtained, the characteristic matrix is obtained through the processing of the steps, and the characteristic matrix is input into the internal threat detection model to detect the threat in the internal network.
Further, the multi-source user behavior log includes a mobile device usage record, a file operation log, an email log, and a web browsing record in addition to a login log.
Further, the behavior information of the user comprises frequency-based user behavior statistical characteristics and content-based user behavior statistical characteristics; the frequency-based user behavior statistical characteristics are average counts of different types of user behaviors, and the content-based user behavior statistical characteristics are content generated based on user behavior operation.
Furthermore, the frequency-based user behavior statistical characteristics comprise one or more of user login and logout times, user login and logout times during the next shift, the number of computers logged in by a user, the connection times of equipment during the next shift, the number of computers connected by the equipment, the transmission times of different files, the total number of file transmission, the number of files transmitted during the next shift, the number of executable file transmissions, the number of computers involved in file transmission, the number of sent e-mails, the number of e-mails sent to the inside of an organization, the number of e-mails sent to the outside of the organization, the average size of e-mails, the number of e-mail attachments, the number of e-mail receivers, the number of sent e-mails during the next shift, the number of computers used for receiving the e-mails, the number of web pages browsed and the number of web pages browsed during the next shift;
the content-based user behavior statistical characteristics comprise one or more of the number of e-mail associated with emotional tendency, the number of web pages browsed associated with decryption, the number of web pages browsed associated with job recruitment, the number of web pages browsed associated with hacker, the number of web pages browsed associated with cloud storage, the number of web pages browsed associated with social contact, and the number of web pages browsed associated with emotional tendency.
Further, the user login behavior association graph is a same composition, is formed by association relations between different users and different users, and is represented as G ═ V, E, where V represents a set of nodes in the graph, and each node represents a user; and E represents a set of edges in the graph, and each edge represents the association relationship between two corresponding users.
Further, the method for extracting the structural features of the user behavior for each node in the sequence comprises the following steps: forming a node sequence set S by generating a plurality of random walk sequences with fixed lengths; and (3) for each node u, calculating a loss function f maximized on the set S through Skip-gram model learning to obtain a node embedding vector, wherein the node embedding vector is the structural characteristic of the user behavior.
Further, the method for fusing the statistical characteristics and the structural characteristics of the user behaviors comprises the following steps: and splicing the statistical characteristics and the structural characteristics of the user behaviors, normalizing the characteristic values to be in a range of 0-1 by using Min-Max normalization, and converting into a two-dimensional characteristic matrix.
Further, during the training process in the Capsule neural network, accumulating the reconstruction error loss and the slow limit loss to obtain a final loss function, and using the final loss function to adjust the parameters of the Capsule neural network to obtain the internal threat detection model; the reconstruction error loss is the Euclidean distance between the characteristic matrix and the output of a Sigmoid layer in reconstruction, the output of the Sigmoid layer in reconstruction reconstructs a digital decoding structure from a Digitcaps digital Capsule layer of a Capsule neural network, a correct activation vector of a digital Capsule is reserved by a masking method, and then the activation vector is reconstructed to obtain the reconstruction error loss.
Compared with the existing internal threat detection method, the method has the following advantages:
the method and the device respectively excavate the potential characteristics of the user behavior from the statistical level and the structural level of the multi-source user behavior log, effectively utilize various characteristic types and combine the correlation information between users, thereby more effectively detecting the internal threat, solving the problems that only one-sided user behavior information can be learned and the correlation between the users is not considered in the original internal threat detection method, and further improving the accuracy of the internal threat detection.
Drawings
Fig. 1 is a general flow chart of an internal threat detection method based on feature fusion according to an embodiment of the present invention.
Fig. 2 is a schematic diagram illustrating extraction of statistical features of user behaviors according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating extraction of user behavior structural features according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features and advantages of the present invention more obvious and understandable by those skilled in the art, the technical cores of the present invention are further described in detail with reference to the accompanying drawings and examples.
The embodiment of the invention provides an internal threat detection method based on feature fusion, which is specifically described as follows:
as shown in fig. 1, is a general flow chart of internal threat detection based on feature fusion. The method is divided into 5 steps in total,the first step is to collect and analyze the log R of the userLMobile device usage record RDFile operation log RFE-mail log RMAnd a web browsing record RWForming a multi-source user behavior log R, and respectively extracting statistical characteristics F of user behaviors from the multi-source user behavior log R obtained by analysisstatStructural features F related to user behaviorstruThen, the obtained two types of features are fused and converted, and each internal user forms a unique feature matrix FmatrixAnd finally, inputting the data into a Capsule neural network for training to obtain an internal threat detection model M based on feature fusion, and detecting the internal threat user by using the model M.
As shown in fig. 2, the analyzed multi-source user behavior log is subjected to statistics of user behaviors from two aspects of frequency-based and content-based, the frequency of user behavior operation can often reflect typical characteristics of user behaviors, and the user behavior statistical characteristics based on the frequency are average counts of different types of user behaviors, such as the number of login and logout times of users every day, the number of login and logout times of users every off-duty time, and the like; and on the other hand, the user behavior statistical characteristics based on the content are extracted based on the content generated by behavior operation, and are mainly text characteristics generated in user communication, such as emotional tendency in an email, the access times of a specific webpage and the like. Here, a total of 33 statistical features of user behavior are extracted.
Specifically, the user behavior statistical characteristics are extracted by taking a user as a unit, and the frequency-based user behavior statistical characteristics comprise user login and logout times, user login and logout times in the off-duty time and the number of computers logged in by the user; the connection times of the equipment, the connection times of the equipment during the off-duty time and the number of computers connected with the equipment; the transmission times of different files, the total transmission number of the files, the number of files transmitted in the off-duty time, the transmission number of executable files and the number of computers related to the file transmission; the number of sent e-mails, the number of e-mails sent to the inside of the organization, the number of e-mails sent to the outside of the organization, the average size of e-mails, the number of e-mail attachments, the number of e-mail receivers, the number of sent e-mails during the next work and the number of computers used for receiving the e-mails; the number of web page browses and the number of web page browses at the next shift. The content-based user behavior statistical characteristics comprise the number of e-mail pieces related to emotional tendency; the number of web pages viewed related to decryption, the number of web pages viewed related to job recruitment, the number of web pages viewed related to hackers, the number of web pages viewed related to cloud storage, the number of web pages viewed related to social interaction, and the number of web pages viewed related to emotional tendency.
As shown in fig. 3, a user login behavior association graph G ═ V, E is constructed according to the login log of the user, the constructed user login behavior association graph is a homograph, where V represents a set of nodes in the graph, E represents a set of edges in the graph, and V has only one attribute, i.e., the user, and E has only one attribute, i.e., whether there is an association relationship between users. Suppose a useriWith userjWhen the same equipment is logged in, the user usesiWith userjThere is an association between them. And constructing a user login behavior association diagram. Then, a random walk sequence S with a fixed length is simulated by using a random walk method, peripheral neighbor nodes are walked, the length of the random walk is set to be l for any node u epsilon V in the graph G, and a node sequence set S is generated as { S { (S) }1,s2,…,smIn which s isiAnd representing the ith random walk sequence, wherein m represents the number of all the random walk sequences, and then, for each node u, learning through a Skip-gram model to obtain a d-dimensional node vector representation. The Skip-gram model learns node embedding vectors by maximizing a loss function f on a node sequence set S, wherein the specific loss function f is shown as formula (1):
f=∑u∈WlogP(N(u)|u) (1)
wherein, W is a word list containing each independent node, N (u) is a neighbor node set of the node u, and the base number of log is 2; p (n (u) is the transition probability of a given source node u to u's neighbor nodes, which is defined as shown in equation (2):
Figure BDA0003493820330000051
wherein v and v' are two vector representations of the node u, and v is a low-dimensional embedded vector of the final node u, and user nodes with similar device login behaviors often have similar low-dimensional embedded vectors. T denotes a transposed matrix, niA neighbor node i that is a node u, W is a vocabulary of independent nodes, W is a vocabulary containing each independent node, N is a set of neighbor nodes of the node,
Figure BDA0003493820330000052
two vector representations of node u, respectively.
In the process of feature fusion, firstly, the statistical features F of the user behaviors are obtained by extractionstatStructural features F related to user behaviorstruSplicing, normalizing the feature value to be in the range of 0-1 by using Min-Max normalization, and eliminating the dimension difference among different features by using Min-Max normalization so as to avoid the great influence of the features of different dimensions on the model, wherein the features are converted into a two-dimensional feature matrix F after being normalizedmatrixFor input into a subsequent deep neural network model.
In the neural network training stage, the labeled training data is subjected to the preprocessing to obtain a feature matrix FmatrixInputting the abnormal data into a Capsule neural network, and outputting whether the user corresponding to the characteristic matrix is abnormal or not after respectively passing through a Conv convolution layer, a Primary caps main Capsule layer and a Digitcaps digital Capsule layer. Compared with the traditional neural network, the Capsule neural network has stronger learning and characterization capabilities and can better adapt to the learned characteristics.
In the training process, in order to adjust parameters of the Capsule neural network, the embodiment of the invention uses a slowness Loss (Margin Loss) function, and for each Capsule k, a Loss function LkSuch as the formula (3) Shown in the figure:
Lk=Tkmax(0,m+-|vk|)2+λ(1-Tk)max(0,|vk|-m-)2 (3)
wherein v iskIs the output vector of the capsule neural network; | vkI is the modular length of the output vector and can represent the size of class probability; when class k is the correct classification, Tk1, otherwise Tk=0。m+And m-The two hyperparameters control the upper and lower bounds of the right classification and wrong classification losses, respectively, e.g. if the probability of correct classification is greater than m+When this is the case, then the loss of correct classification is 0, where m+And m-Set to 0.9 and 0.1, respectively, and λ is the regularization parameter, here set to 0.5.
Meanwhile, the embodiment of the invention accumulates the reconstruction error loss and the slow limit loss as a final loss function to obtain a more accurate detection model, namely the reconstruction error loss LRThe method is characterized in that a digital decoding structure is reconstructed from Digitcaps digital capsule layers, only the activation vector of a correct digital capsule is reserved by using a masking method, then the activation vector is used for reconstruction, and the reconstruction error loss L is reducedRAs input feature matrix FmatrixAnd the euclidean distance between the outputs of the Sigmoid layers in the reconstruction. The Sigmoid layer is an additional layer for performing reconstruction error calculation, and is used for comparing errors of a matrix reconstructed by the output vector with an input matrix. The final loss function L is shown in equation (4):
Figure BDA0003493820330000061
wherein, CkIs the number of categories (i.e. 2 categories of malicious and non-malicious users), here Ck=2,LkFor the loss of slack for each capsule k, LRFor reconstruction errors, the neural network is trained with the minimum error function L as a target.
After training is finished, an internal threat detection model based on feature fusion can be obtained, the model is formally used for detecting internal threats, an obtained feature matrix is input into the model for detection during detection, and a detection result is output.
The internal threat detection model is evaluated as follows, wherein the data set is derived from an internal threat test data set issued by the computer security emergency response group of the university of tomilong in the card, and the data set respectively collects the log of 1000 users of 17 months, the mobile device usage record, the file operation log, the email log and the web browsing record, wherein 70 internal threat users are contained.
The training of the model mainly learns the parameters of the Capsule neural network through a training data set, and when the loss function obtains the minimum value, the parameters of the Capsule neural network are the optimal parameters, so that the internal threat detection model based on feature fusion is obtained.
In the evaluation of the model, four indexes of Accuracy, F-measure, AUC and Recall are adopted for evaluation, and the Accuracy of the Accuracy visually reflects the performance of the model; the F-measure is a harmonic mean of the accuracy rate and the recall rate, the accuracy rate and the recall rate are usually a pair of contradictory measures, and generally, the higher the accuracy rate is, the lower the recall rate is; when the recall rate is high, the accuracy rate is often low. The F-measure value balances the two values, if the obtained values of the accuracy rate and the recall rate are higher and the other value is lower, the final F-measure value is also lower, and the F-measure value is higher only when the values of the two values are higher simultaneously, so that the occurrence of extreme conditions is avoided, and the model can be better and accurately evaluated; AUC is the area under the receiver operating characteristic curve and is commonly used for evaluating the effect of the two-classification model; recall is a Recall rate, which refers to the ratio of the number of correctly predicted positive samples to the total number of real positive samples, and can reflect the detection capability of the model on internal threat users.
In the evaluation experiment, a data set is divided into a training set and a test set according to the ratio of 6:4, a feature matrix corresponding to the training set is used for training, a feature matrix corresponding to the test set is used for testing, and trained models, Accuracy, F-measure, AUC and Recall, are respectively as follows: 0.980, 0.958, 0.933 and 0.867 which are all higher than the internal threat detection methods based on machine learning such as Logistic regression, SVM support vector machine, Random Forest and the like and deep learning such as CNN convolutional neural network, GCN graph convolutional neural network and the like, and the effectiveness of the internal threat detection method based on feature fusion in the internal threat detection field is proved.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail by using examples, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered in the claims of the present invention.

Claims (8)

1. An internal threat detection method based on feature fusion is characterized by comprising the following steps:
collecting multi-source user behavior logs in an internal network, analyzing by taking users as units, and forming an independent multi-source user behavior log record for each user;
counting behavior information of users from a multi-source user behavior log record corresponding to each user, and extracting statistical characteristics of user behaviors;
constructing a user login behavior association graph by using login logs in multi-source user behavior log records of a user, randomly walking neighbor nodes of each user node to generate a plurality of random walking sequences with fixed lengths, wherein each sequence is formed by arranging the walking nodes in a front-back sequence, and the structural characteristics of the user behavior are extracted aiming at each node in the sequence;
fusing the extracted statistical characteristics of the user behaviors with the structural characteristics to form a characteristic matrix;
processing training data containing internal threat labels through the steps to obtain a feature matrix, inputting the feature matrix into a Capsule neural network for training to obtain an internal threat detection model based on feature fusion; when the threat in the internal network is formally detected, the multi-source user behavior log in the internal network is obtained, the characteristic matrix is obtained through the processing of the steps, and the characteristic matrix is input into the internal threat detection model to detect the threat in the internal network.
2. The method of claim 1, wherein the multi-source user behavior log comprises mobile device usage records, file operation logs, email logs, and web browsing records in addition to login logs.
3. The method of claim 1, wherein the behavior information of the user comprises frequency-based user behavior statistics and content-based user behavior statistics; the frequency-based user behavior statistical characteristics are average counts of different types of user behaviors, and the content-based user behavior statistical characteristics are content generated based on user behavior operation.
4. The method of claim 3, wherein the frequency-based statistical user behavior characteristics include one or more of a number of user logins and logouts, a number of user logins and logouts at a time of work, a number of computers that the user logs in, a number of connections of the device at a time of work, a number of computers that the device is connected to, a number of transmissions of different files, a total number of file transmissions, a number of files transmitted at a time of work, a number of executable file transmissions, a number of computers involved in file transmissions, a number of outgoing e-mail messages, a number of e-mail messages sent to the interior of the organization, a number of e-mail messages sent to the exterior of the organization, an average size of e-mail messages, a number of e-mail recipients, a number of e-mail senders, a number of computers used to receive e-mails, a number of web pages viewed, and a number of web pages viewed at a time of work;
the content-based user behavior statistical characteristics comprise one or more of the number of e-mail associated with emotional tendency, the number of web pages browsed associated with decryption, the number of web pages browsed associated with job recruitment, the number of web pages browsed associated with hacker, the number of web pages browsed associated with cloud storage, the number of web pages browsed associated with social contact, and the number of web pages browsed associated with emotional tendency.
5. The method of claim 1, wherein the user login behavior association graph is a homograph, and is composed of different users and associations between different users, and is represented by G ═ V, E, where V represents a set of nodes in the graph, and each node represents a user; and E represents a set of edges in the graph, each edge represents an association relationship existing between two corresponding users, and the association relationship comprises the fact that the two users log in the same device.
6. The method of claim 1, wherein the method of extracting structural features of user behavior for each node in the sequence is: forming a node sequence set S by generating a plurality of random walk sequences with fixed lengths; and (3) for each node u, calculating a loss function f maximized on the set S through Skip-gram model learning to obtain a node embedding vector, wherein the node embedding vector is the structural characteristic of the user behavior.
7. The method of claim 1, wherein the statistical and structural features of user behavior are fused by: and splicing the statistical characteristics and the structural characteristics of the user behaviors, normalizing the characteristic values to be in a range of 0-1 by using Min-Max normalization, and converting the characteristic values into a two-dimensional characteristic matrix.
8. The method of claim 1, wherein during training in the Capsule neural network, reconstruction error loss and slowness loss are accumulated to obtain a final loss function, and the final loss function is used to adjust parameters of the Capsule neural network to obtain the internal threat detection model; the reconstruction error loss is the Euclidean distance between the characteristic matrix and the output of a Sigmoid layer in reconstruction, the output of the Sigmoid layer in reconstruction reconstructs a digital decoding structure from a Digitcaps digital Capsule layer of a Capsule neural network, a correct activation vector of a digital Capsule is reserved by a masking method, and then the activation vector is reconstructed to obtain the reconstruction error loss.
CN202210105573.5A 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion Active CN114553497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210105573.5A CN114553497B (en) 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210105573.5A CN114553497B (en) 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion

Publications (2)

Publication Number Publication Date
CN114553497A true CN114553497A (en) 2022-05-27
CN114553497B CN114553497B (en) 2022-11-15

Family

ID=81673865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210105573.5A Active CN114553497B (en) 2022-01-28 2022-01-28 Internal threat detection method based on feature fusion

Country Status (1)

Country Link
CN (1) CN114553497B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599295A (en) * 2016-12-27 2017-04-26 四川中电启明星信息技术有限公司 Multi-track visual analyzing evidence-collecting method for user behaviors and system
CN107426231A (en) * 2017-08-03 2017-12-01 北京奇安信科技有限公司 A kind of method and device for identifying user behavior
CN111163057A (en) * 2019-12-09 2020-05-15 中国科学院信息工程研究所 User identification system and method based on heterogeneous information network embedding algorithm
US10685293B1 (en) * 2017-01-20 2020-06-16 Cybraics, Inc. Methods and systems for analyzing cybersecurity threats
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
WO2020227429A1 (en) * 2019-05-06 2020-11-12 Strong Force Iot Portfolio 2016, Llc Platform for facilitating development of intelligence in an industrial internet of things system
CN112184468A (en) * 2020-09-29 2021-01-05 中国电子科技集团公司电子科学研究院 Dynamic social relationship network link prediction method and device based on spatio-temporal relationship
US20210014256A1 (en) * 2019-07-08 2021-01-14 Fmr Llc Automated intelligent detection and mitigation of cyber security threats
CN113919239A (en) * 2021-12-15 2022-01-11 军事科学院系统工程研究院网络信息研究所 Intelligent internal threat detection method and system based on space-time feature fusion

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599295A (en) * 2016-12-27 2017-04-26 四川中电启明星信息技术有限公司 Multi-track visual analyzing evidence-collecting method for user behaviors and system
US10685293B1 (en) * 2017-01-20 2020-06-16 Cybraics, Inc. Methods and systems for analyzing cybersecurity threats
CN107426231A (en) * 2017-08-03 2017-12-01 北京奇安信科技有限公司 A kind of method and device for identifying user behavior
WO2020227429A1 (en) * 2019-05-06 2020-11-12 Strong Force Iot Portfolio 2016, Llc Platform for facilitating development of intelligence in an industrial internet of things system
US20210014256A1 (en) * 2019-07-08 2021-01-14 Fmr Llc Automated intelligent detection and mitigation of cyber security threats
CN111163057A (en) * 2019-12-09 2020-05-15 中国科学院信息工程研究所 User identification system and method based on heterogeneous information network embedding algorithm
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
CN112184468A (en) * 2020-09-29 2021-01-05 中国电子科技集团公司电子科学研究院 Dynamic social relationship network link prediction method and device based on spatio-temporal relationship
CN113919239A (en) * 2021-12-15 2022-01-11 军事科学院系统工程研究院网络信息研究所 Intelligent internal threat detection method and system based on space-time feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIYA SU; YEPENG YAO; ZHIGANG LU; BAOXU LIU: "Understanding the Influence of Graph Kernels on Deep Learning Architecture: A Case Study of Flow-Based Network Attack Detection", 《2019 18TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS/13TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (TRUSTCOM/BIGDATASE)》 *
XUEYING HAN; RONGCHAO YIN; ZHIGANG LU; BO JIANG; YULING LIU: "STIDM: A Spatial and Temporal Aware Intrusion Detection Model", 《2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM)》 *
张巍等: "数据融合的协同网络入侵检测", 《计算机应用》 *
计晨晓等: "基于多维度数据分析的移动威胁感知平台建设", 《中国新通信》 *

Also Published As

Publication number Publication date
CN114553497B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
Cai et al. Structural temporal graph neural networks for anomaly detection in dynamic graphs
Li et al. Data fusion for network intrusion detection: a review
Palmieri et al. A distributed approach to network anomaly detection based on independent component analysis
Taghavinejad et al. Intrusion detection in IoT-based smart grid using hybrid decision tree
Le et al. Exploring anomalous behaviour detection and classification for insider threat identification
Hall et al. Predicting malicious insider threat scenarios using organizational data and a heterogeneous stack-classifier
Dou et al. Pc 2 a: predicting collective contextual anomalies via lstm with deep generative model
Levshun et al. A survey on artificial intelligence techniques for security event correlation: models, challenges, and opportunities
Yeruva et al. Anomaly Detection System using ML Classification Algorithm for Network Security
Jacobs et al. Enhancing Vulnerability prioritization: Data-driven exploit predictions with community-driven insights
Zhao et al. A survey of deep anomaly detection for system logs
Miller et al. Size agnostic change point detection framework for evolving networks
Li et al. Image‐Based Insider Threat Detection via Geometric Transformation
Golczynski et al. End-to-end anomaly detection for identifying malicious cyber behavior through NLP-based log embeddings
Mvula et al. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
de Riberolles et al. Anomaly detection for ICS based on deep learning: a use case for aeronautical radar data
Yan et al. A Threat Intelligence Analysis Method Based on Feature Weighting and BERT‐BiGRU for Industrial Internet of Things
Mink et al. Everybody’s got ML, tell me what else you have: Practitioners’ perception of ML-based security tools and explanations
Meena Siwach Anomaly detection for web log data analysis: A review
Torre et al. Deep learning techniques to detect cybersecurity attacks: a systematic mapping study
Mujtaba et al. Detection of suspicious terrorist emails using text classification: A review
Chen et al. Data curation and quality assurance for machine learning-based cyber intrusion detection
CN114553497B (en) Internal threat detection method based on feature fusion
CN115987544A (en) Network security threat prediction method and system based on threat intelligence
de la Torre-Abaitua et al. A compression based framework for the detection of anomalies in heterogeneous data sources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant